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SURFACE EXPRESSION LIBRARIES 
OF HETEROMERIC RECEPTORS 



BACKGROUND DF THE TNVFNT'TOM 

This invention relates generally to recombinant 
5 expression of heteromeric receptors and, more particularly, 
to expression of such receptors on the surface of 
filamentous bacteriophage. 

Antibodies are heteromeric receptors generated by a 
vertebrates organism's immune system which bind to an 

10 antigen. The molecules are composed of two heavy and two 
light chains disulfide bonded together. Antibodies have 
the appearance of a "Y" - shaped structure and the antigen 
binding portion being located at the end of both short arms 
of the Y. The region on the heavy and light chain 

15 polypeptides which corresponds to the antigen binding 
portion is known as variable region. The differences 
between antibodies within this region are primarily 
responsible for the variation in binding specificities 
between antibody molecules. The binding specificities are 
20 a composite of the antigen interactions with both heavy and 
light chain polypeptides. 

The immune system has the capability of generating an 
almost infinite number of different antibodies. such a. 
large diversity is generated primarily through 

25 recombination to form the variable regions of each chain 
and through differential pairing of heavy and light chains. 
The ability to mimic the natural immune system and generate 
antibodies that bind to any desired molecule is valuable 
because such antibodies can be used for diagnostic and 

30 therapeutic purposes. 

Until recently, generation of anribodies aaainsr a 
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desired molecule was accomplished only through manipulation 
of natural immune responses. Methods included classical 
immunization techniques of laboratory animals and 
monoclonal antibody production. Generation of monoclonal 
5 antibodies is laborious and time consuming, it involves a 
series of different techniques and is only performed on 
animal cells. Animal cells have relatively long generation 
times and require extra precautions to be taken compared to 
procaryotic cells to ensure viability of the cultures. 

10 A method for the generation of a large repertoire of 

diverse antibody molecules in bacteria has been described 
Huse et al., Science, 246, 1275-1281 (1989), which is 
herein incorporated by reference. The method uses the 
bacteriophage lambda as the vector. The lambda vector is 
15 a long, linear double-stranded DNA molecule. Production of 
antibodies using this vector involves the cloning of heavy 
and light chain populations of DNA sequences into separate 
vectors. The vectors are subsequently combined randomly to 
form a single vector which directs the coexpression of 
20 heavy and light chains to form antibody fragments. a 
disadvantage to this method is that undesired combinations 
of vector portions are brought together when generating the 
coexpression vector. Although these undesired combinations 
do not produce viable phage, they do however, result in a 
25 significant loss of sequences from the population and, 
therefore, a loss in diversity of the number of different 
combinations which can be obtained between heavy and light 
chains. Additionally, the size of the lambda phage gene ds 
large compared to the genes that encode the antibody 
30 segments. This makes the lambda system inherently more 
difficult to manipulate as compared to other available 
vector systems. 

There thus exists a need for a method to generate 
diverse populations of heteromeric receptors which mimics 
3= the natural immune system, which is fast and efficient and 
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results in only desired combinations without loss of 
diversity. The present invention satisfies these needs and 
provides related advantages as well. 

SUMMARY O F THE jm/FHTTnv, 

5 The invention relates to a plurality of cells 

containing diverse combinations of first and second DNA 
sequences encoding first and second polypeptides which form 
a heteromeric receptor, said heteromeric receptors being 
expressed on the surface of a cell, preferably one which 
10 produces filamentous bacteriophage, such as M13. Vectors 
cloning systems and methods of making and screening the 
heteromeric receptors are also provided. 

BRIEF DESCFTPTTOM n F T HE HPAWT^ c 

Figure 1 is a schematic diagram of the two vectors 
15 used for surface expression library construction from heavy 
and light chain libraries. M13IX30 (Figure 1A) is the 
vector used to clone the heavy chain sequences (open box) 
The single-headed arrow represents the Lac p/o expression 
sequences and the double-headed arrow represents the 
portion of M13IX30 which is to be combined with M13IX11 
The amber stop codon and relevant restriction sites are 
also shown. M13IX11 (Figure IB) is the vector used to 
clone the light chain sequences (hatched box) . Thick lines 
represent the pseudo-wild type ( gvili) and wild type 
25 (gvili) gene VIII sequences. The double-headed arrow 
represents the portion of M13IX11 which is to be combined 
with M13IX30. Relevant restriction sites are also shown. 
Figure 1C shows the joining of vector population from heavy 
and light chain libraries to form the functional surface 
expression vector M13IXHL. Figure ID shows the generation 
of a surface expression library in a non-suppressor strain 
and the production of phage. The phage are used to infect 
a suppressor, strain (Figure IE, for surface expression and 
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screening of the library. 

Figure 2 is the nucleotide sequence of M13IX30 (SEQ ID 
NO: 1) . v v ■ LU 

Figure 3 is the nucleotide sequence of M13IX11 (SEQ ID 
5 NO: 2) . w 

Figure 4 is the nucleotide sequence of M13IX34 (SEQ ID 
NO: 3) . 

Figure 5 is the nucleotide sequence of M13IX13 (SEO ID 
NO: 4) . 
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Figure 6 is the nucleotide sequence of M13IX60 (SEQ ID 
NO: 5) . v 

DETAILED PES CR T PTTfiU 0F THE TWVPHTTfm - 

This invention is directed to simple and efficient 
methods to generate a large repertoire of diverse 
combinations of heteromeric receptors. The method is 
advantageous in that only proper combinations of vector 
portions are randomly brought together for the coexpression 
of different DNA sequences without loss of population size 
or diversity. The receptors can be expressed on the 
surface of cells, such as those producing filamentous 
bacteriophage, which can be screened in large numbers. The 
nucleic acid sequences encoding the receptors be readily 
characterized because the filamentous bacteriophage produce 
single strand DNA for efficient sequencing and mutagenesis 
methods. The heteromeric receptors so produced are useful 
in an unlimited number of diagnostic and therapeutic 
procedures. 



In one embodiment, two populations of diverse heavy 
(He) and light (Lc) chain sequences are synthesized by 
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polymerase chain reaction (PCR, . These populations are 
cloned into separate M13-based vector containing elements 
necessary for expression. The heavy chain vector contains 
a gene VIli (gvill) coat protein sequence so that 
translation of the He sequences produces gVHi-Hc fusion 
proteins. The. populations of two vectors are randomly 
coined such that only the vector portions containing the 
He and Lc sequences are joined into a single circular 

10 ZIT: T V 0mbined Vector Erects the coexpression of: 
10 both He and sequences for assembly of the two 

polypeptides and surface expression on M13. a mechanism 
also exists to control the expression of gVIli-Hc fusion 
proteins during library construction and screening 
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As used herein, the term "heteromeric receptors" 
refers to proteins composed of two or more su*units which 
together exhibit " binding activity toward particular 
molecule. it is understood that the term includes the 
subunit fragments so long as assembly of the polypeptides 
and function of the assembled complex is retained 
Heteromeric subunits include, for example, antibodies and 
fragments thereof such as Fab and (Fab), portions, T cell 
receptors, integrins, hormone receptors and transmitter 
receptors . 

As used herein, the term "preselected molecule" refers 
to a molecule which is chosen from a number of choices 
The molecule can be, for example, a protein or peptide, or 
an organic molecule such, as a drug. Benzodiazapam is a 
specific example of a preselected molecule. 

As used herein, the term "coexpression" refers to the 
expression of two or more nucleic acid sequences usually 
expressed as separate polypeptides. For heteromeric 
receptors, the coexpressed polypeptides assemble to form 
the heteromer. Therefore, "expression elements" as used 
herein, refers to sequences necessary for the 
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transcription, translation, regulation and sorting of the 
expressed polypeptides which make up the heteromeric 
receptors. The term also includes the expression of two 
subunit polypeptides which are linked but are able to 
5 assemble into a heteromeric receptor, a specific example 
of coexpression of linked polypeptides is where He and Lc 
polypeptides are expressed with a flexible peptide or 
polypeptide linker joining the two subunits into a single 

10 Tv" T link6r " fl6Xible enOU9h t0 allow — ciation 
10 of He and Lc portions into a functional Fab fragment. 

The invention provides for a composition of matter 
comprising a plurality of procaryotic cells containing 
diverse combinations of first and second DNA sequences 
encoding first and second polypeptides which form a 
15 heteromeric receptor exhibiting binding activity toward a 
preselected molecule, said heteromeric receptors being 
expressed on the surface of filamentous bacteriophage. 

DNA sequences encoding the polypeptides of 

heteromeric receptors are obtained by methods known to one 
20 skilled in the art. such methods include, for example 
CDNA synthesis and polymerase chain reaction (PGR) . The 
need will determine which method or combinations of methods 
is to be used to obtain the desired populations of 
sequences. Expression can be performed in any compatible 
vector/host system. such systems include, for example 
Plasmids or phagemids in procaryotes such as E. col,' y eas t 
systems and other eucaryotic systems such as mammalian 
cells, but will be described herein in context with its 
presently preferred embodiment, i.e. expression on the 
surface of filamentous bacteriophage. Filamentous 
bacteriophage include, for example, M13, fl and fd 
Additionally, the heteromeric receptors can also be 
expressed in soluble or secreted form depending on the need 
and the vector/host system employed. 
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Expression of heteromeric receptors such as antibodies 
or functional fragments thereof on the surface of M13 can 
be accomplished, for example, using the vector system shown 
xn Figure 1. Construction of the vectors enabling one of 
ordinary skill to make them are explicitly set out in 
Example I. The complete nucleotide sequences are given in 
Figures 2 and 3 (SEQ ID NOS: l and 2). This svstem 
produces randomly combined populations of heavy (He) and 
light (Lc) chain antibody fragments functionally linked to 
expression elements. The He polypeptide is producec as a 
fusion protein with the M13 coat protein encoded by gene 
VIII. The gVIIl-Hc fusion protein therefore anchors the 
assembled He and Lc polypeptides on the surface of M13 
The diversity of He and Lc combinations obtained by this 
system can be 5 x 10 7 or greater. Diversity of less than 5 
x 10 can also be obtained and will be determined by the 
need and type of heteromeric receptor to be expressed. 

Populations of He and Lc encoding sequences to be 
combined into a vector for coexpression are each cloned 
into separate vectors. For the vectors shown in Figure l 
diverse populations of sequences encoding He polypeptides 
are cloned into M13IX30. (SEQ ID NO: l) . Sequences encoding 
Lc polypeptides are cloned into M13IX11 (SEQ ID NO: 2). 
The populations are inserted between the Xho I-Spe I or Stu 
I restriction enzyme sites in M13IX30 and between the Sac 
I-Xba I or Eco RV sites in M13IX11 (Figures 1A and B, 
respectively) . 

The populations of He and Lc sequences inserted into 
the vectors can be synthesized with appropriate restriction 
recognition sequences flanking opposite ends of the 
encoding sequences but this is not necessary. The sites 
allow annealing and ligation in-frame with expression 
elements of these sequences into a double-stranded vector 
restricted with the appropriate restriction enzyme. 
Alternatively, and a preferred embodiment, the He and Lc 
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) 
/ 

iio 



sequences can be inserted into the vector without 
r..triction of the DNA. This method of clonlno i 
.eneficia .because naturaUv encoded restriction elv^ 
sates may be present within the sequences, thus, causL 
destructxon of the sequence when treated with a restrict on 

t™ b Ftor : i l0niBff Vith ° Ut " S ^™> -e sequenc 
treated briefly with a 7 ■ c • 

y 3 to 5 ""nuclease such as T4 DNA 

polymerase or exonuclease II! . A 5 * to 3' exonuclease win 
also accomplish the S ame functlQn ^ ^ ^ 

st^L H re " ainS Sh ° Uld ^ tary to singl . 

stranded overhang within the vector which remain after 
resection at the cloning site and treatment vUh 
exonuclease The exonuclease treated inserts are annealed 
»ith the restricted vector by methods known to one skilled 

and? eXOnUCl " Se B " h0d — - £ 

and ls easier to perform. 

- The vector used for He populations, M13 IX3 0 (Figure 
1A; SEO id N0: „ contains, in addition to expression 
elements, a sequence encoding the pseudo-wild type gvm 
30 product downstream and in frame with the cloning sll s 
Thxs gene encodes the wild type M!3 ,vm amfno acid 
sequence but has been chanced -at the nucleotide level to 
reduce homologous recombination with the wild type gvm 
contained on the same vector. The wild type "m is 
l r a\ Sent ^ ^at at least some functional" non-fusion 

i «at prote ln wUl be produced. The inclusion of a wild 

pnloe d theref ° re redUC6S ^ ot non-viab e 

pep! de P Tu U aM bi0l ° 9lCal SeleCti0 " 

pept.de fu S1 on proteins. Differential regulation of the 

two genes can also be used to control the relative ratio o 
the pseudo and wild type proteins. 

Also contained downstream and in frame with the 

located between the inserted He sequences and the gvm 
5 sequence and is in frame. As was the function o, the wild 



WO 92/06204 



PCT/LS9 1/07 149 



type gVIIl, the amber stop codon also reduces biological 
selection when combining vector portions to produce 
functional surface expression vectors. This is 

accomplished by using a non-suppressor (sup 0) host strain 
5 because the non-suppressor strains will terminate 
expression after the He sequences but before the pseudo 
gVIII sequences. Therefore, the pseudo gVIIl will 
/ essentially never be expressed on the phage surface under 
/ these circumstances. Instead, only soluble He polypeptides 
10 will be produced. Expression in a non-suppressor host 
strain can be advantageously utilized when one wishes to 
produce large populations of antibody fragments. stop 
codons other than amber, such as opal and ochre, or 
molecular switches, such as inducible repressor elements, 
can also be used to unlink peptide expression from surface 
expression. 

The vector used for Lc populations, M13IX11 . (SEQ ID 
NO: 2), contains necessary expression elements and cloning 
sites for the Lc sequences, Figure IB. As with M13IX30, 
upstream and in frame with the cloning sites is a leader 
sequence for sorting to the phage surface. Additionally, 
a ribosome binding site and Lac 2 promoter/ operator 
elements are also present for transcription and translation 
of the DNA sequences. 



15 



20 



25 Both vectors contain two pairs of Mlu I -Hind III 

restriction enzyme sites (Figures 1A and B) for joining 
together the He and Lc encoding sequences and their 
associated vector sequences. Mlu I and Hind III are non- 
compatible restriction sites. The two pairs are 
symmetrically orientated about the cloning site so that 
only the vector portions containing the sequences to be 
expressed are exactly combined into a single vector. The 
two pairs of sites are oriented identically with respect to 
one another on both vectors and the DNA between the two 
sites must be homologous enough between both vectors to 
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allow annealing. This orientation allows cleavage of each 
circular vector into two portions and combination of 
essential components within each vector into a single 
circular vector where the encoded polypeptides can be 
coexpressed (Figure 1C) . 

Any two pairs of restriction enzyme sites can be used 
so long as they are symmetrically orientated about the 
cloning site and identically orientated on both vectors. 
The sites within each pair, however, should be non- 
identical or able to be made differentially recognized as 
a cleavage substrate. For example, the two pairs of 
restriction sites contained within the vectors shown in 
Figure 1 are Mlu I and Hind III. The sites are 
differentially cleavable by Mlu I and Hind III 
15 respectively. One skilled in the art knows how to 
substitute alternative pairs of restriction enzyme sites 
for the Mlu I-Hind III pairs described above. Also, 
instead of two Hind III and two Mlu I sites, a Hind III and 
Not I site can be paired with a Mlu I and a Sal I site, for 
2 0 example. 



10 
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The combining step randomly brings together different 
He and Lc encoding sequences within the two diverse 
populations into a single vector (Figure 1C; M13IXHL) . The 
vector sequences donated from each independent vector, 
M13IX30 and M13IX11, are necessary for production of viable 
phage. Also, since the pseudo gVIII sequences are 
contained in M13IX30, coexpression of functional antibody 
fragments as Lc associated gVIIl-Hc fusion proteins cannot 
be accomplished on the phage surface until the vector 
sequences are linked as shown in M13IXHL. 

The combining step is performed by restricting each 
population of He and Lc containing vectors with Mlu I and 
Hind III, respectively. The 3' termini of each restricted 
vector population is digested with a 3* to 5' exonuclease 
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as described above for inserting sequences into the cloning 
sites. The vector populations are mixed, allowed to anneal 
and introduced into an appropriate host. A non-suppressor 
host (Figure ID) is preferably used during initial 
5 construction of the library to ensure that sequences are 
not selected against due to expression as fusion proteins 
Phage isolated from the library constructed in a non- 
suppressor strain can be used to infect a suppressor strain 
for surface expression of antibody fragments. 

10 A method for selecting a heteromeric receptor 

exhibiting binding activity toward a preselected moiecule 
from a population of diverse heteromeric receptors 
comprising: (a) operationally linking to a first vector a 
first population of diverse DNA sequences encoding a 
15 diverse population of first polypeptides, said first vector 
having two pairs of restriction sites symmetrically 
oriented about a cloning site; (b) operationally linking to 
a second vector a second population of diverse DNA 
sequences encoding a diverse population of second 
20 polypeptides, said second vector having two pairs of 
restriction sites symmetrically oriented about a cloning 
site in an identical orientation to that of the first 
vector; (c) combining the vector products of step (a) and 
(b) under conditions which allow only the operational 
combination of vector sequences containing said first and 
second DNA sequences; (d, introducing said population of 
combined vectors into a compatible host under conditions 
sufficient for expressing said population of first and 
second DNA sequences; and (e) determining the heteromeric 
30 receptors which bind to said preselected molecule. The 
invention also provides for determining the nucleic acid 
sequences encoding such polypeptides as well. 

Surface expression of the antibody library is 
performed in an amber suppressor strain. As described 
13 5 above, the amber stop codon between the He sequence and the 
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a non- 



gvnr sequence unlinks the tvo components 
suppressor strain. Isolatlng the phage « 

will link the He sequences to the gVIXI sequence during 
express^ (Figure lE) . Mt[lring t[)e suppresSQr J 

M o/ n 3llOWS ~*"«!-*«» °" ^e surface or 

M13 of all anybody species within the library as gVm 

fusion proteins <gvm-Fab fusion proteins, 
Alternatively, the DNA can be isolated from th e "on- 
io suppressor strain and then introduced into a suppressor 
strain to accomplish the same . effect. 

The level of expression of gvm-Fab fusion proteins 
can additionally be controlled at the transcriptional 
level. Both polypeptides of the gVIIX-Fab fusion proteins 

15 are under the inducible control of the Lac z 
promoter/operator system other inducible promoters can" 
work as well and are known by one skilled in the art. For 
high levels of surface expression, the suppressor library 
is cultured in an inducer of the Lac z promoter such as 

20 isopropylthio-B-galactoside (IPTG) . mducible control is 
beneficial because biological selection against non- 
functional gvm-Fab fusion proteins can be minimis by 
cultunng the library under non-expressing conditions" 
Expression can then be induced only at the time of 

Z'tT t0 enSU " tb " t ^ enti " PoP^ation of 
antibodies within the library are accurately represented on 
the phage surface. Also, this can be used to control the 
valency of the antibody on the phage surface. 
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The surface expression library is screened for 
specific Fab fragments which bind preselected molecules by 
standard affinity isolation procedures. such methods 
include, for example, panning, affinity chromatography and 
solid phase blotting procedures. Panning as described by 
Parmley and smith. Gene 73:305-318 (1988,, which is 
incorporated herein by reference, is preferred because high 



/ 
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; titers of phage can be screened easily, quickly and in 
' small volumes. Furthermore, this procedure can select 
/ minor Fab fragments species within the population, which 
otherwise would have been undetectable, and amplified to . 
substantially homogenous populations. The selected Fab 
fragments can be characterized by sequencing the nucleic 
acids encoding the polypeptides after amplification of the 
phage population. 

The following examples are intended to illustrate but 
10 not limit the invention. 

EXAMPLE J 

construction. FvT,rp eei on anri «^ reenino nf 
Antibody Fragments nn the Surface of Mn 

This example shows the synthesis of a diverse 
15 population of heavy (He) and light (Lc, chain antibody 
fragments and their expression on the surface of M13 as 
gene VIli-Fab fusion proteins. The expressed antibodies 
derive from the random mixing and coexpression of a He and 
Lc pair. Also demonstrated is the isolation and 
characterization of the expressed Fab fragments which bind 
benzodiazapam (BDP) and their corresponding nucleotide 
sequence. 

isolation of m PNA nvA PCP ampli cation nf i H H^ y 
Fragments 
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The surface expression library is constructed from 
mRNA isolated from a mouse that had been immunized with 
KLH-coupled benzodiazapam (BDP) . BDP was coupled to 
keyhole limpet hemocyanin (KLH) using the techniques 
described in Antibodies: A Labor^nr-y Harlow and 

30 Lane, eds., Cold Spring Harbor, New York (1988), which is 
incorporated herein by reference. Briefly, 10.0 milligrams 
<mg) of keyhole limpet hemocyanin and 0.5 mg of BDP with a 
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glutaryl spacer an. N-hydroxysu=cini»ide linker appendages 
coupling was performed as in Jonda et al., Sci Zl' 

ne' ™ l" 88 '' WWCh ^ in =^— serein by refect' 
The KIJI-BDP conjugate was removed by gel til tration 
5 chromatography through Sephadex G-25. 

mice ?v "?; BDP COnjU5 " e ™* P " Pared f " in ^«"" i^o 
mice by adding loo „ of the conjugate to 250 * of 

Phosphate buffered saline (pbs, . ^ equal voluae 

complete Freund's adjuvant was arfrtort , . 

s aaae ° and emulsified the 
10 entire solution for 5 minutes m.. „ ... 

=> "mutes. Mice were injected with 300 

Hi of the emulsion. Injections were given subcutaneously 
at several sites using a 21 gauge needle. A second 
immunization with BDP was given two weeks later. This 
injection was. prepared as follows: 50 „g of BDP w»« 
15 diluted in 250 111 of PBS » , 

. 3U " x or PBS and an equal volume of alum was 

Mixed with the solution. The mice were injected 
intraperitoneal^ with 500 Ml of the solution using a 23 
gauge needle, one month later the mice were given a final 
injection of 50 „ of the conjugate diluted to 200 „1 i„ 
20 PBS This injection was given intravenously in the lateral 
tail vein using a 30 gauge needle. Five days after this 
final injection the mice were sacrificed and total cellular 
was isolated from their spleens. 

Total RNA was isolated from the spleen of a single 
25 mouse immunized as described above by the method of 
chomczynski and sacchi, A nal , plo^ , 162:156-159 (198 7) 
which is incorporated herein by reference. Briefly' 
immediately after removing the spleen from the immunized 
mouse the tissue was homogenized in 10 ml of a denaturing 
30 solution containing 4.0 M guanine isothiocyanate, 0.25 M 
sodium citrate at pH 7.0, and 0.1 „ 2-mercaptoethanol using 
a glass homogenizer. one ml of sodium acetate at a 
concentration of 2 M at P H 4.0 was mixed with the 
homogenized spleen. One al of saturated phenol was also 
35 mixed with the denaturing solution containing the 
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homogenized spleen. Two ml of a chloroform: isoamyl alcohol 
(24:1 v/v) mixture was added to this homogenate. The 
homogenat was mixed vigorously for ten seconds and 
maintained on ice for 15 minutes. The homogenate was then 
3 transferred to a thiOc-walled 50 ml polypropylene 
centrifuge tube (Fisher Scientific Company, Pittsburgh 
PA). The solution was centrifuged at 10,000 x g for 20 
minutes at 4'C. The upper RNA-containing aqueous layer was 
transferred to a fresh 50 ml polypropylene centrifuge tube 
and mixed with an equal volume of isopropyl alcohol. This 
solution was maintained at -20 -c for at least one hour to 
precipitate the RNA. The solution containing the 
precipitated RNA was centrifuged at 10,000 x g for twenty 
minutes at 4'C. The pelleted total cellular rna was 
collected and dissolved in 3 ml of the denaturing solution 
described above. Three mis of isopropyl alcohol was added 
to the resuspended total cellular RNA and vigorously mixed 
This solution was -maintained at -20'C for at least l hour 
to precipitate the RNA. . The solution containing the 
precipitated RNA was centrifuged at 10,000 x g for ten 
minutes at 4'C. The pelleted RNA was washed once with a 
solution containing 75% ethanol. The pelleted rna was 
dried under vacuum for 15 minutes and then resuspended in 
dimethyl pyrocarbonate (DEPC) treated (DEPC-H 2 0) H 2 0. 

Poly A* RNA for use in f irst strand cONA synthesis was 
prepared from the above isolated total RNA using a spin- 
column kit (Pharmacia, Piscatavay, N J, as recommended by 
the manufacturer. The basic methodology has been described' 
by Aviv and Leder, Proc. n^i . ^ 6g:l4()8 _ 

1412 (1972), which is incorporated herein by reference 
Briefly, one half of the total RNA isolated from a single 
immunized mouse spleen prepared as described above was 
resuspended in one ml of DEPC-treated dH 2 o and maintained at 
65 -C for five minutes. One ml of 2x high salt loading 
buffer (100 mM Tris-HCL at pH 7.5, i M sodium chloride, 2.0 
mM disodium ethylene diamine tetraacetic acid (EDTA) at P H 
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8.0, and 0.2, sodium dodecyl sulfate (SDS), was added tc 
the resuspended RNA and the fixture was allowed tc cool to 
room temperature. The mixture was then applied to an 
ol lg o-dT (C0Xl.bor.tiv. Research Type 2 or Type 3 Bedford 
5 «A, column th.t was previously prepared * washing ^ 
oUgo-dT „«, a solution containing 0.1 „ sodium hydroxide 
and 5 mM EDTA and then equilibrating the column with oepc! 
treated dH,0. The eluate was collected in . s terU e 
polypropyXene tube and reapplied to the same column after 
10 heating the elu.te for 5 minutes at 65-c. The oligo ^ 
column was then washed with j »i n( w 
bu«er consisting of so ^ T^c ' a ^ ^ ^ ^ 
sodium chloride, l mM EDTA at p„ 8 . 0 and V X* s'ds ^ 
olxgo dT column was then washed with 2 ml of 1 x 
» -It buffer (50 M Tris-„=, at p„ * ^ ^ 

chlor.de mM EDTA at pH 8.0 and 0.1* S0S) . The mRNA 12 
eluted with 1 ml of buffer consisting of 10 mK Tris-HCL !t 
PH 7.5. X mM EDTA at p„ 8.0 and o.05* SDS. The messeng" 
RNA was purified by extracting this solution with ' 
20 phenol/chloroform followed bv a ai „„,. . 

chloroform, ethanol precipitated h eXt " Ctl ° n With 10 <" 

P Plt " ed in DEPC 

in preparation for PCR amplification, mRNA was used as 
a template for cDNA synthesis. In a typical 250 ,1 reverb 
25 transcrxptxon reaction mixture, 5-10 „ of spleen 

water was first annealed with 500 ng ,0.5 pmol, of eith« 
the 3. V, primer (primer 1 2 , Table !, or the 3. * primer 
(primer 9, Table II) at 65 «c «. P r 

. ' " 65 c for 5 minutes. Subsequently, 

the mixture was adiust-orf ^ • ' ' 

adjusted to contain 0.8 mM dATP n n ™m 

30 dCTP, 0.8 « dCTP. 0.8 mM dTTP, 100 mM Tris-H^ ,p„ , " 
10 mM MgCl, <o mM KC1, and 20 mM 2 -„E. Molon y-„ur n e 
LeuXemxa vxrus (Bethesda Research Laboratories (BRxT 
Gaithersburg, MD) Reverse . • 

add.* „„,< ,1 , Reverse transcriptase, 2S units, was 
added and the solution was incubated for 1 hour at 40-c. 

'5 The resultant first strand odha was phenol extracted, 
ethanol precxpitated and then used in the polymerase chain 
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30 



reaction (PGR, procedures described below for aapiificat 
of heavy and light chain sequences. 



ion 



Primers used for amplification of heavy chain Fd 

in Table I. Amplication was performed in eight separate 
reactions, as described by saiki et al., sciease 235.4.7 
«*1 (l9M) , which is incorporated herein bT^rence ' el 
reaction containing one of the 5 . pri.ers ( p riB ers 2 to" 
SEQ xo NOS; 7 through u, respectively, and one of the v 
10 pnmers (primer 12; SEQ ID NO; 17, listed in Table I tL 
refining 9 . primers, used for amplification in . ^ 
reaction, are either a degenerate primer (primer l ; 
NO- 6) or a p„ m er that incorporates inosine at four 
degenerate positions (primer 10; SEQ 10 NO; is, . ^ 
15 remainmg primer (primer 11; SEQ ID NO; 16, was used to 
construct Fv fragments. The underlined portion of the 5-~ 
pnmers incorporates an Xho I site and that of the 3. 

fLT T 1 reStriCUon sit « "»r cloning the am pl if ie d 
fragments ln to the M13IX30 vector in a predetermined 
20 reading frame for expression. nninea 



TABI.F 7 
HEAVY CHAIN PRTMrpf: 

CC G G t 
,5 11 5 '~ AGGT * CT CTCCAGTP GG - 3- 

GA A T a 

AGGTCCAGCTGCJECGASTCTGG - 3> 
AGGTCCAGCTG£I£SiSTCAGG - 3- 
AGGTCCAGCTTSICGASTCTGG - 3' 
AGGTCCAGCTT£2CGAG.TCAGG - 3- 
AGGTCCAACTGCTCGAG.TCTGG - 3- 
AGGTCCAACTG£I£GA£TCAGG - 3- 
AGGTCCAACTT£1CGA£TCTGG - 3' 



1) 


5' 


2) 


5' 


3) 


5' 


4) 


5' 


5) 


5' 


6) 


5» 


7) 


5' 


8) 


5' 



15 
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9) 5' - AGGTCCAACTTCICSASTCAGG - 3« 

10) 5' - AGGTIIAICTICICGAGTC T GG - 3' 

A 

5 11) 5 ' - CTATTAACTAGTAACGGTAACAGT - 

GGTGCCTTGCCCCA - 3» 

12) 5' - AGGCTTASIAGIACAATCCCTGG - 
GCACAAT - 3' 

10 oh • PriDerS f " an P llfi «"°n °t »°use kappa light 

10 cham sequ ence S for construction of the M13IXU l ibrary m 
shown in Table II. These primers were chosen to contain 
restriction sites which were compatible with vector and not 
present in the conserved sequences of the mouse light chain 
*HMA. Amplification was performed as described above ^ 
five separate reactions, each containing one of the 5. 
primers (primers 3 to 7; SEQ ID NOS: 20 through 24 
respectively, and one of the 3. primers (primer 9; SEQ ID ' 
«o. «> listed in Tab ie n. Tbe fining pr °j r 
primer SEO. ID NO: 25, was used to construct Fv 
fragments. The underlined portion of the ,. primers 
depots . sac 1 restriction site and that of the 3 • primers 
« Xba I restriction site for cloning of the amplified 
fragments into the-M13IXll vector in a predetermined 
reading frame for expression. 

TABLE tj ■ 

LIGHT PW&TM PRIMERS 

CCAGTTCCGAGCTCGTTGTGACTCAGGAATCT - 
CCAGTTCCGA^CTCGTGTTGACGCAGCCGCCC - 
CCAGTTCCGAGCTCGTGCTCACCCAGTCTCCA - 
CCAGTTCCGAGCTCCAGATGACCCAGTCTCCA - 
CCAGATGTGAGCTCGTGATGACCCAGACTCCA - 
CCAGATGTGAGCTCGTCATGACCCAGTCTCCA - 3« 
CCAGTTCCGAGCTCGTGATGACACAGTCTCCA - 3' 
GCAGCATTC2AGAGTTTCAGCTCCAGCTTGCC - 3* 
GCGCCGTCTAGAATTAACACTCATTCCTGTTGAA - 3 



1) 


5 • - 


2) 


5« - 


3) 


5 • - 


4) 


5' - 


5) 


5' - 


6) 


51 _ 


7) 


51 _ 


8) 


51 _ 


9) 


51 _ 



3 
3 

3 1 
3 ' 
3 ' 
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PCR amplification for heavy and light chain , 
was perform i„ . 100 Ml reaction ^^TlT'**™* 
above described products of the reve rse III \ ' ^ 
reaction ,= 5(1 g of the „ A hybr 

5 V, primers (primers 2-9, Table 1; SEQ id nos- 7 ! 
1,. respectively, for heavy chain ampliation or 
nmol of ,. v t primer (primer 9. Tabie I I; seq ID KO a 

and one of the 5» v nri^v.. / wu * 26 ) / 

3 v L primers (primers 3-7 t^hi~ 

10 nos: 2 o through M . respectively, for each ""l ^ " 
amplification, a mixture of dKTPs at 200 ^ so I KC1 " 

=f Thermus aguaticus DNA polymerase. The reaction mixture 
was overlaid with mineral oil and subjected to «o eye es of 
15 amplification. Each • cycles of 

tacn amplification cvcl*» 
denaturation at 92'C for i n™,- involved 
. • eot l minute, annealing at 52-c for 5 

minutes, and elongation at 72'C for 1 . • 
.-.n^.s , /2 c for 1-5 minutes. The 

amplified samples were ©vt-r-a^*.*^- 4. • . 

once with CHC1 eth e , XtraCted twl « «"»> P»enol/CHCl 3 and 
.0 in ,„ - " ethanol -P"=iPit"ed, and stored at -70-c 
! ° X " 10 * Tr "- HCl - P« 7. S ! mM EDTA. The resultant 
products were used in constructing the „ 13I „o and"l Ix "l 
libraries (see below) . «i-JiXll 



Vector Consi-mrfi« n 



Two M13-based vectors, M13IX30 (seq id no- i, „„h 
» M13XXU , SE0 ID H0: 2)i were eoB . true ^ d ° » ^ £ 

ragmenr 9ati0n °' ^ * <* »S 

a=U ta!; "r^T- VeCt ° rS """ructed to 

facilitate the random joining and subsequent surface 
expression o, antibody fragment populations 

M13IX30 ,SE 0 I0 N0: 1)( „ 
constructed to harbor diverse m ™„h 

f>r-»™..» diverse populations of He antibody 

fragments. „ 13 mpl9 (Pharmacia, Piscatavay NJ , was thl 
starting vector. This vector was modified o contain in 
addition to the encoded wiid type „» gene vm, , . 
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pseudo-wild type gene V1IJ sequence 

codon between it and the restriction sites for cloni„' 

0 T™eotides.. ^ " U 1 reStriCti0 " Site ^= » o" 
of sequences by hybridization and, Spe 1 and xh " 

5 ^tt sites in " frMe with the "; n 

VIII for cloning H= sequences; (3, sequences necesslry f« 
expression, such as a promoter 

translation initiation siglls.- ~| two^s™, * 
nu I sites for random joining of He and Lc vector 
10 Portions, and ,5, various other mutations to remove 
redundant restriction sites and the amino terminal port! 



in the f lr st step, an M l3 -based vector containing the 

ZZrV" V " i0US 0ther DUt " i0nS - —nfcted 
"n : ""^ " eP inV ° 1Ved the =-«ru=tion of a 

HI3- o 3 rV ^ 9 SeP "" e H " OPl8 VeCtor yield 
MI3IX03 This vector was then expanded to contain 

expression sequences and restriction sites for He sequence" 
*0 to form M13IX04B. The fourth and final step involvTthl 
incorporation of the newly constructed sequences i" 
M13IX04B into M13IX01F to yield M13IX30. ■ 

construction of M13IX01F first involved the generation 
of a pseudo wild-type gvm sequence for surface expression 

IT:'" f" 9 "^- »- *W =e„e enc 1 

the identical amino acid sequence as that of the wild type- 
oene ; however, the nucleotide sequence has been altered so 
that only 63* identity exists between this gene and the 
encoded wild type gene VIII. Modification of the gene VIII 
30 nucleotide sequence used for surface expression relces III 
possibility of homologous recombination with the wild type 
gene VIII contained on the same vector. Additionally, the 
wild type M13 gene VI„ was retained in the vector system 
to ensure that at least some functional, non-fusion coat 
protein would be produced. The inclusion of wild type gene 
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VIII facilitates the growth 

— i s surface j^^^ ^^ns where 

-re, ore reduces the possihi^ y ^ ! "* 
production from the fusion genes. Phdge 



5 



The pseudo-wild type oene vttt 

ch-icauy s y nthes i2ing ^ ™ W """ructea by 

encode bo t„ strands of the ™ "^^T" 1 ^^ which- 

presented in Table III. "^nucleotides are 
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25 
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TABLE 7X7 



Pseudo-wild TYT a 



Top strand 
OliaoniK-ieotirioe. 



VIli 03 

VIII 04 

VIII 05 
VIII 06 
VIII 07 



Bottom Strand 

Oliaonnclenl-iH^ 



VIII 08 

VIII 09 
VIII 10 
VIII 11 
VIII 12 



Oliaonncleol-iHa o 



Sequen ce fR' i-r, v ^ 

GATCC TAG GCT GAA GGC 
GAT GAC CCT GCT AAG GCT 
GC 

A TTC AAT AGT TTA CAG 
GCA AGT GCT ACT GAG TAC 
A 

TT GGC TAC GCT TGG GCT 
ATG GTA GTA GTT ATA GTT 
GGT GCT ACC ATA GGG ATT 

AAA TTA TTC AAA AAG TT 
T ACG AGC AAG GCT TCT 
TA 



AGC TTA AGA AGC 
CGT AAA CTT TTT 
TTT 

AAT CCC TAT GGT 
AAC TAT AAC TAC 
AGC CCA AGC GTA 
GTA CTC AGT AGC 
C CTG TAA ACT 
TGC AGC CTT AGC 
ATC GCC TTC AGC 



CTT GCT 
GAA TAA 



AGC ACC 
TAC CAT 
GCC AAT 
ACT TG 
ATT GAA 
AGG GTC 
CTA G 



30 



"°- 27) and VIII 08 (SE0 10 NO- 351 * h I 
through „, respectxvely, and VI1I 09 . 12 (SEQ „ ^ ^ 
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through 36, respectively)) were mixed at 200 no a.«* • 

J.1 ««X volume, phosphorated with ^ poTy^.^. 0 

to 70 c for 5 minutes, and annealed into double-strandJ 
5 *=rm by heating to 65-c for 3 minutes, followed bV==oU no 
to room temperature over a period of 30 minutes 2 
reactions were treated with l.o u of T4 ONA Uga se (H £T 
and l mM ATP at room temperature for 1 hour. fo iLe^i 

10 were th "* ' " inUteS - ^"i"* 1 °"S°nu=leo«a.s 

10 were then annealed to the ligated oligonucleotides Z 

annealed and ligated oligonucleotides yielded a doub^ 

< "h , ^ 3 ' e " d - A '"-lational stop cod 1 

tamper, immediately follows the Bam HI site. The gene mi 

th?;::: b ? ns with the codon ™ ^ *» ~Jr,™ 

the stop codon. The double-stranded insert was cloned in 
frame with the Eco „ and Sac I sites within the „l" 
polylmfcer. To do so, M1 3mpl9 was digested with Bam Hi 
Hew England Biolahs, Beverley, », Ind Hind i„ ^ew 
20 England Biolabs, and combined at a molar ratio of i:io 

TriT;: stT T d insert - ^ u '" ions 

Tris-HCl T" OV6rni9ht ^ " U "" <=0 m« 

Tris HC1 PH 7.8. 10 mM MgCl,. 20 mK DTI, 1 mM ATP, 50 wg/ ml 
BSA) containing l.o u of T4 i»> ,< 9 
» Biolabs,. ^l^^rtJ^Llts.'rS 

r:r for positiv * — « ~: 

Several mutations were generated within the construct 
to yield functional M13I oip «»,«. - ~. construct 
30 using the method of Kuli « ? ™™ 

382 (19871 „!,.... . ' ^yol- 154:367- 

382 (1987) , which is incorporated herein by reference for 
site-directed mutagenesis. The reagents, strains' IZ 
protocols were obtained from a Bio Had Mutag nesis ut ,Bio 
Rad, Kichmond, ca, and mutagenesis was performed s 
3 5 recommended by the manufacturer. "ormed as 
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Two Fok I sites were removed from the vector as wel, 
t h e Hind III site at the en, of the pseudo gene £ 
sequence using the mutant oligonucleotides 5 " 
CATTTTTGCAGATGGCTTAGA-3 » ( S E 0 ID NO: 37, " !." 

5 TAGCATTAACGTCCAATA-3 ' (SEQ ID NO- 381 „ „• 

mi„ t .„ '• New Hlnd HI and 

Mlu I sites were also introduced at position 39 i 9 and 3951 

of M13IX01F. The oligonucleotides used 



10 



... 5 »,u tJera oas used for this 
"utagenesis had the sequences s" 
ATATATTTTAGTAAGCTTCATCTTCT-3 1 (SEO ID NO: 39) and ,." 

GACAAAGAACGCGTGAAAACTTT-3 • (SEO ID NO: 40, . respective!/ 
The ammo terminal - _ _ Ay ' 



20 



25 



30 



amino terminal portion o, Lac z „ as deleted by 
o xgonuc eotide-directed mutagenesis using the mutant 
o gonudeotlde 5 • -GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3 ^ 
(MQ ID NO: „,. in constructing the above mutations all 
changes made in a H13 coding region were performed s^ch 
that the ammo acid sequence remained unaltered. The 
resultant vector. „1 3IX0 1P, was used in the final step^o 
construct M13IX30 (see below). - ° 

in the second step, M13m P 18 was mutated to remove the 
5- end of Lac Z up to the !^ i n.^- 

the l»= , v. binding site and including 

AddV „ " bindin9 SitS and •*»« -don 

Additionally,, the polymer was removed and a Mlu I site 

was introduced in the coding region of Lac z. A single 
oligonucleotide was used for these mutagenesis and had tl 
sequence -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC-3 • 
Z Q ID no: 43, . Restriction enzyme sites for Hin^i a d 
Eco Ri were introduced downstream o, the Mlu I site using 

TMCGCcT^ n tide 5 ' -^^""^^^ATTAAGCTTGGG 

yuLeTth • TheS£ - di "="ions of „l3m P 18 

yielded the precursor vector H13IX03. 

The expression sequences and cloning sites were 
introduced into M13IX03 by chemically synthesizing a series 
of oligonucleotides which encode both strands of the 
l e !,Tl SeqUenCe - °^°™=leotides are presented in 



35 Table IV 
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TABLE XV 
M13IX30 OlioormPl^ otide K*rj ?r 



10 



Top strand . 

01 iaonu cleot Trioc 

084 
027 

028 

029 



Sequent ($t f. n 

GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG 



TGAAACAAAGCACTATTGCACTGGC 



TACCGT 



ACTCTTACCGT 



sr TAccccTGTGAcAAAAcccGcccAG =Tcc 

?§GMcS GGCCTATOTGCCCAGGGATT =TACTAG 



15 



20 



Bottom 

Oliaonu cleot Trioc 
085 
031 

032 

033 



Sequenc e f5' t-n t j 

TGGCGAAAGGGAATTCGGATCCACTAGTACAA' 
GGCACAATAGGCCTGACTCGAGCAGCTGGA 



TCCCTG 



GCTT 



CCAGGGCG 



TTGTCACAGGOGTAAACAGTAACGGTAACOGTAAGTGT 



GTGCAATAGTGCTTTGTTTCA 
ACAA 



CTTTATTTTCTCCATGT 



25 



30 



The above oligonucleotides of Table IV, except for the 
;" 9 ° nUCle0tid - °" (SEQ I D KO= 44, and 085 (SEQ 

Ltel t ; BiX6d ' phos ^l"ed. annealed and 

lifted to for* a double-stranded insert as described in- 
E X a„ple I. However, instead of cloning directly into the 
intermediate vector the insert was first aaplifiL by rcT 
The terminal oligonucleotides were used as priaers for PGR 
Oligonucleotide 084 (SEQ ID KO: 44, contains a Hind ili 
site, 10 nucleotides internal to its 5- end and 
oligonucleotide 085 (SEQ ID NO: 48) has an Eco Ri site . at - 
its 5 . end. Following amplification, the products were 
restricted with Hind in and Eco RI and ligated, as 
described ln Example I, into the polyline of M13 npl8 
dusted vxth the sa„e two enzy.es. The resultant double 



WO 92/06204 

PCT/US9 1/07 149 

26 

stranded insert contained a ribosome binding site a 
translation initiation codon followed by a leader sequence 
and th restriction en2yme sites ^ ***** 

ol.gonucleot.des (Xho I, stu I , Spe I). The intermedia" 
5 vector was named M13IX04. 

During cloning of the double-stranded insert, it was 
found that one of the Ger ~j.„ . , . 

OI cne GCC =odons in oligonucleotides 028 
and its complement in 031 was deleted, since this deletion 

10 of the two GCC codons. Additionally, oligonucleotide 032 
ID N0 = 50 ' «ntained a GTG codon where a GAG codon was 
needed. Mutagenesis was performed using the 

oligonucleotide 5'-TAACGGTAAGAGTGCCAGTGC-3 ' (SEQ ID NO- 52, 
to convert the codon to the desired sequence. ' T he 

15 resultant vector is named M13IX04B. 

The third step in " constructing M13IX30 . involved 
M13 S 1X04B 9 eXP " SSi0n Cl ° nin9 

M13IX01F. This was accomplished by digesting H13IX04B with 
Dra in and Bam HI and gel isolating th. 700 base pair 
insert containing the sequences of interest. «i 3I x 0l F was 
likewise digested with Dra Hi an d Bam HI. The insert was 
combmed with the double digested vector at a molar ratio 

25 of tie T V 58 ^ " deS " ibed in EXMple Z - The 

th T constru « «»IX30, is shown in Figure 2 (SEQ 

ID NO: 1, . Figure ia also shows M13IX30 where each of the 
elements necessary for surface expression of He fragments 
is marked. It should be noted during modification of the 
vectors, certain sequences differed from the pubUshed 
equence of H13mpl8. The new sequences are incorporated 
into the sequences recorded herein. 

H13IX11 (SEQ ID NO: 2), or the Lc vector, was 
constructed to harbor diverse populations of Lc antibody 
fragments. This vector was also constructed from M13m pl9 



20 
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and centals: (l , sequences necessary f or expression, such 
as a promoter, signal sequence and transition init ation 
signals; ( 2) Eco RV restriction site for insert . on 1 ^ 

sequences by hybridization and sac I and Xba I restriction 
5 sites for cloning of Lc sequences- r 3 i ^ „ • 

ttt-mi„ T , "fences. (3) two pairs of Hind 

III Mlu i sites for randoa joining of Hc and ^ vector 

por tl ons, and «, various other station to remove 
redundant restriction sites. 

The expression, translation initiation signals 
10 cloning sites, and one of the „l u ; sites were constructed 
by annealing of overlapping oligonucleotides as described 
above to produce a double-stranded insert containing a 5- 
Eco k site and a 3- „i„ d m site. The overlapping 
oligonucleotides are shown in Table V and were ligated a! 
13 a double-stranded insert between the Eco „ and 

sites of M13*pl8 as described for the expression sequences 
inserted into M13JX03. The riboso*e binding site (AGGAGAC) 
is located in oligonucleotide 015 and the translation 
initiation codon (ATG) is the first three nucleotides of 
20 oligonucleotide 016 (SEQ id NO: 55) . 
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TABLE V 



Oligonucleotide for rnn^^.,^ n nf 

Translation Signals j r M13TY] f 



10 



15 



20 



Oligonucleotide* 

082 

015 
016 

017 

018 

019 
083 

021 
022 
023 



Seouenee fm to T) 

CACC TTCATG AATTC GGC AAG 
GAGACA GTCAT 

AATT C GCC AAG GAG ACA GTC AT 
AATG AAA TAC CTA TTG CCT ACG 
GCA GCC GCT GGA TTG TT 
ATTA CTC GCT GCC CAA CCA GCC 
ATG GCC GAG CTC GTG AT 
GACC CAG ACT CCA GATATC CAA 
CAG GAA TGA GTG TTA AT 
TCT AGA ACG CGT C 
TTCAGGTTGAAGC TTA CGC GTT 
CTA GAA TTA ACA CTC ATT 
CCTGT 

TG GAT ATC TGG AGT CTG GGT 
CAT CAC GAG CTC GGC CAT G 
GC TGG TTG GGC AGC GAG TAA 
TAA CAA TCC AGC GGC TGC C 
GT AGG CAA TAG GTA TTT CAT 
TAT GAC TGT CCT TGG CG 

Oligonucleotide 017 (SEQ ID NO: 56) contained a Sac I 
restriction site 67 nucleotides downstream from the ATG 
codon. The naturally occurring Eco RI site was removed and 
new Eco RI and Hind in sites were introduced downstream 
from the Sac i. Oligonucleotides 5'- 

TGACTGTCTCCTTGGCGTGTGAAATTGTTA-3 1 (SEQ ID NO: 63) and 5'- 
30 TAACACTCATTCCGG ATGGAATTCTGGAGTCTGGGT- 3 ' (SEQ ID NO- 64, 
were used to generate each of the mutations, respectively 
The Lac Z ribosome binding site was removed when the 
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origin.1 Eco Ri site in M i3mpl9 was Buta 

when the new Eco RI and Hind ,„ ^ **»xtxo»li y , 
spontaneous ioo bp deletion w„ £ • 

sites, since the deletion does not affect the I ?• 
5 was retained in the final vector. UnCtl ° n ' " 

In addition to the above mutations 
modifications were M de to income ^ °' 0th ~ 
seo.ences. The Hind m site usL to \Lt 'Z 
stranded insert was reaoved with the J! " 

10 CCCACTGCCAACTCACCCCTTCTA-3. « ^"Tl"'"' >" 
m - «« ! sites were introduced 

ATATATTTTAGTAAGCTTCATCTTCT-3 1 (SEQ „ N0: M) for ^ . 

The sequence of the resultant vector »„„„ . 
shown in Figure 3 (SEQ id N0 - 2 , Pf M13IX U. «r- 

H13IXU Where each of the element ^" " ^ ^ 

• -face egression ^iZZZZZTZ 17 

marked. een ^ fragments is 



25 
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Library Constmrf < ?n 

KB abo!e f PUlati0n ° £ Ho and * "fences synthesized by 
PCR above are separately cloned into H13IX30 and MUixii 
respectively, to create He and u> iibraries. ' 

predated 21 " Pr ° dUCtS " m "hanol 

: h : ic^s pi r nits °< « ^ 

tne re actions incubated at iot c 

termini by e.onuclease digestion R " ""^ " 

heating at 7 .. c for a ^ Mi « T * 

eS * "131X30 is digested with 
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Stu I and M13IX11 is digested with eco rv. Both vectors 
are treated with T < DNA polymerase as descried 
coined wxth the appropriate PC R products at a 1:1 Bola " 
ratxo at „ ng /( a to anneal i„ the above buffer a " 
5 temperature overnight. DNA from each annealing s 
. ectroporated into MK30-3 (Boehringer, Xndianapol ^IK 
as descried beiow, to generate the He and to libraries 

et al^ 1 MK3 °" 3 iS ele «"P°«*^ « described by smith 
et al kocus 12:38-40 (1990) which is incorporated here^ 
10 by reference. The cells are prepared by inoculating a 
resh colony of HX30-3 into s mis of SOB without magnesium 
(30 9 bacto-tryptone, 5 g bacto-yeast extract, o 584 , 
Nad. <,-.».« g KC1. dH 2 o to 1,000 mis, and grown with 

is ;r: 1 Tr ration 1 overni5htat "- c - ^^:^ gnes z 

(500 .1, ls moculated at 1:1000 with the overnight culture 
and grown with vigorous aeration at 37-c until the OD is 
• S (about 2 to 3 h,. The cells . ,re harls ed" 
ntri n' °° " 5 ' 000 ,2 ' 60 ° >< 5» - a GS3 rotor 

in 500 ml 0 f lcc -cold lo* (v/v) sterile glycero , 
centnfuged and resuspended a second time in the same 
-nner. After a third centr*u,ation. the cells are 
resuspended in 10t sterile glycerol at a final volume" 

>5 300 „ ;, SUCh ttat ^ ° D5 » ° f tte su ^-«n was 200 to 
300 usually, resuspension is achieved in the lo* glycerol 
that rema ln ed in the bottle after pouring Iff th e 
supernate. celis are frozen in 40 „! ali " 
microcentrifuge tubes using a dry ice-ethanol Tath and 
stored frozen at -70'c. 

0 ic e h T e " C6llS " e eleCtr °e°'"^ by thawing slowly on 
ice before use and mixing with about 10 pg to 50 o ng of 
e=t=r per 40 Ml of ceu suspension a <o J, * 

placed ln an „.i cm electroporation chamber (Bio-Rad, 
Richmond, CA, and pulsed once at o-c using 4 *, patlUe 
resxstor 25 „F, X.88 KV, which gives a pulse length (r, of 
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4 as. a 10 mi aliquot of the pulsed cells are diluted 
into l m i soc (98 mis SOB plus 1 ml of 2 M Mgci 2 and i ml of 
2 M glucose) in a 12- x 75-nun culture tube, and the culture 
is shaken at 37'C for 1 hour prior to culturing in 
5 selective media, (see below) . 

Each of the libraries are cultured using methods known 
to one skilled in the art. Such methods can be found in 
Sanbrook et al., Molecular Cloning: A Laboratory Manuel 
Cold spring Harbor Laboratory, cold Spring Harbor 1989 ' 

10 and in Ausubel et al., Current Protocols in Molecular 
Biology, John Wiley and Sons, New York, 1989, both of which 
are incorporated herein by reference. Briefly, the above 
1 ml library cultures are grown up by diluting 50-fold into 
2XYT media (16 g tryptone, 10 g yeast extract, 5 g Na ci) 

15 and culturing at 37'C for 5-8 hours. The bacteria ar 
pelleted by centrifugation at 10,000 x g. The supernatant 
containing phage is transferred to a sterile tube and 
stored at 4*C. 



Double strand vector DNA containing He and Lc antibody 
20 fragments are isolated from the cell pellet of each 
library. Briefly, the pellet is washed in TE (10 mM Tris 
pH 8.0, l mM EDTA) and recollected by centrifugation at 
7,000 rpm for 5- in a Sorval centrifuge (Newtown, CT) . 
Pellets are resuspended in 6 mis • of 10% Sucrose, 50 mM 
25 Tris, P H 8.0. 3.0 ml of 10 Bg// U i yso2 y ne is added and 
incubated on ice for 20 minutes. 12 mis of 0.2 M NaOH, 1% 
SDS is added followed by 10 minutes on ice. The 
suspensions are then incubated on ice for 20 minutes after 
addition of 7.5 mis of 3 M NaOAc , p H 4.6. The samples are 
30 centrifuged at 15,000 rpm for 15 minutes at 4'C, RNased and 
extracted with phenol/chloroform, followed by ethanol 
precipitation. The pellets are resuspended, weighed and" an 
equal weight of CsCl, is dissolved into each tube until a 
density of 1.60 g/ml is achieved. EtBr is added to 600 
3 5 M g/ml and the double-stranded DNA is isolated by 
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equilibrium centrifugation in a tv 

50.000 rp„ for 6 hou 9 „. n ^J e ^; 16 « «*~ f«orv.X, .f 
left half sublior.ry are used l " " 9ht and 

in the ri9h T :;r^v;rert r y librsries 

The surface expression li brary is foraed 
joining of the He containing portion of H13IX30 y w £ ™*°» 
containing portion of M13IXU. The DNAs isolafnT 
library „ as dlgest6d separat ^ J* 01 ™ eaeh 

10 restriction enz OTe The T *" e *" ss aBount of 

with Hind XXX The „ .^TT" " * 

Hlu I. The reactions <>°P ul "i°n is digested with 

extraction If " ° "° PPed by P»*"°Vchlorofo ra 

extraction followed by eth a „ol precipitation. The pellets 

<-«e reactions incubated at 30 t c 

rr^r stopped * heati - at - set 

The „c and X* DKAs are Bixe d to a final concentration of 10 
ng each vector/^ and alioved to anneal at rooa temperature 
overnight The fixture is electroporated into * 0- eu" 
as described above. ceils 



as described above. 
Screening Q f g,„-^ a ,.„ fl 



Librar-i 



25 



30 



of „r«s? feti; a ( r s e t p r ared fro ° 50 bi ^ 

been infected at a To i T'xo^ J ° 1U ' ^ »"* 

° f 10 from phage stocks 

stored at 4*c Th«, „„n*. h y 5toCKS 

at 4 c. The cultures are induced with 2 m IPTG 

Supernatant* are cleared by two centrifugations Id " e 
Phage are precipitated hy adding V^'voluTe of peg 
solution (25% PEG-8000, 2 5 M Na rn , . 
at 4-C overnight The followed by incubation 

centrifuaflH f 6 P reci P it ate is recovered by 

centrifugation for 90 minutes at 10,000 x g. Pnage pellet l 
are resu S p ended in 25 nl of Q Q1 * ^ P^^s 

EDTA and n i* o , C1, pH 7 * 6 ' 1 *° 

EDTA, and o.l% sarkosyl and then shaken slowly at room 

temperature for 30 minutes Th* «„i - • 

es * The solutions are adjusted to 
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0.5 M NaCl and to a final concentration of 5% polyethylene 
glycol. After 2 hours at 4'C, the precipitates containing 
the phage ar recovered by centrif ligation for 1 hour at* 
15,000 X g. The precipitates are resuspended in 10 ml of 
5 NET buffer (0.1 M NaCl/ 1.0 mM EDTA, and 0.01 M Tris-HCl, 
PH 7.6), mixed well, and the phage repelleted by; 
centrifugation at 170,000 X g for 3 hours. The phage 
pellets are resuspended overnight in 2 al of NET buffer and! 
subjected to cesium chloride centrifugation for 18 hours afc 
10 110,000 X g (3.86 g of cesium chloride in 10 ml of buffer). 
Phage bands are collected, diluted 7-hold with NET buffer,' 
recentrifuged at 170,000 X g for 3 hours, resuspended, and 
stored at 4'C in 0.3 ml of NET buffer containing o.l mM- 
sodium azide. 



15 



20 



The BDP used for panning on streptavidin coated dishes 
is first biotinylated and then absorbed against uv- 
inactivated blocking phage (see below) . The biotinylatingr 
reagents are dissolved in dimethyl formamide at a ratio of. 
2.4 mg solid NHS-SS-Biotin (sulf osuccinimidyl 2.- 
(biotinamido)ethyl-l,3'-dithiopropionate; Pierce, Rockford,. 
IL) to l ml solvent and used as recommended by the 
manufacturer. Small-scale reactions are accomplished by 
mixing l M l dissolved reagent with 43 ul of l ag/al BDP 
diluted in sterile bicarbonate buffer (0.1 M NaHCO,, pH 
25 8.6). After 2 hours at 25'C, residual biotinylating.; 
reagent is reacted with 500 /xl 1 M ethanolamine (p H : 
adjusted to 9 with. HC1) for an additional 2 hours. The, 
entire sample is diluted with 1 al TBS containing 1 mg/ml„ 
BSA, concentrated to about 50 Ml on a Centricon 30 ultra- 
filter (Amicon) , and washed on the same filter three times 
with 2 ml TBS and once with 1 al TBS containing 0.02% NaN 3 
and 7 x 10 12 UV-inactivated blocking phage (s e below); the 
final retentate (60-80 M l) is stored at 4 «c. BDP 
biotinylated with the NHS-SS-Biotin reagent is linked to 
35 biotin via a disulf ide-containing chain. 



30 
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UV-irradiated MIS phage are used for blocking anv 
bxotmylated BDP which fortuitously Mnds 

in general. „i3 B ps (Hessing an<J ^age 
« incorporated her e in by reference " 

that the few phage surviving irradiation win not grow in 
the sup o strains used to tit.,- h,. 

library » « »i , surface expression 

library a 5 ml sa»pl e containing 5 x lo" „ 13n p 8 pha - e 

purged as described above, is placed in a s»all P Zl 
10 P ate and irradiated with a ge ra icid a i leap at a distal 
of two feet for 7 minutes (flux 150 NaNj is 

to con and phage particles concentrated to lo" 
particles/ B l on a Centricon 3 0-XDa ultrafilter (Aaicon, . 

is ■ /°7 annin9 ' P°ly«yrene petri plates < 60 x is are 
15 incubated with 1 i of l mg/Bl of streptavidin (BRL)Tn a , 
H NaHCOj p„ e.6-0.02* NaN 3 in „ s M ll, air-tight p" t bo 
overnight in a cola roc. The „ ext day streptavidin Is 
removed and replaced with at least 10 Ml blocking solution 
(». »,/Ml of BSA: 3 M g/»1 of streptavidin, o.i „ „ aHC0 2 
>° ...-0.02% NaH,, a„ d incubated , t ^ t HC °> P« 

temperature. The blocking solution is reeved a nd plates 
are washed rapidly three tines with Tris buffered saline 
containing 0.5* Tween 20 (TBS-0.5% Tween 20) . 

35 w • j SeUction of P ha 9* expressing antibody frag»ents which 
25 bind bop is p.r,or»ed with s ,1 (J . 7 „ BDP) ^ of bl0 ^ 

I' 12 ' t 7 reaCted With 4 50 * > 0rti °" - the 
library. Each Mixture is incubated overnight at „-c 
diluted wuh 1 *i TBS - 0 . 5% ten 20; and transferred ; 

streptavidin-coated netri «i 

3 0 ah « V o P Prepared as described 

3 0 above. After rocking 10 minutes ™ 

y minutes at room temperature 

unbound phage are removed and plates washed ten tiaes with 
TBS-0.5% Tween 20 over a period of 30- 9 0 Minutes. Bound 

buffer (1 „, /n i BSA , o.l „ „„, p „ adjusted tQ 
3= glycerol, for 15 minutes and eluates neutralized with 48 „i 
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2 M Tris ( P H unadjusted) . A 20 /il portion of each eluate 
is titered on MK30-3 concentrated cells with dilutions of 
input phage. 

A second round of panning is performed by treating 750 
5 Ml of first eluate from the library with 5 mM DTT for 10 
minutes to break disulfide bonds linking biotin groups to 
residual biotinylated binding proteins. The treated eluate 
is concentrated on a Centricon 30 ultrafilter (Amicon) 
washed three times with TBS-o.5% Tween 20, and concentrated 
10 to a final volume of about 50 nl. Final retentate is, 
transferred to a tube containing 5.0 M l (2 .7 M g BDP) 
blocked biotinylated BDP and incubated overnight. The 
solution is diluted with 1 ml TBS-0.5% Tween 20, panned, 
and eluted as described above on fresh streptavidin-coated 
15 petri plates. The entire second eluate (800 M l) is 
neutralized with 48 pi 2 M Tris, and 20 „1 is titered 
simultaneously with the first eluate and dilutions of the 
input phage. If necessary, further rounds of panning can 
be performed to obtain homogeneous populations of phage. 
Additionally, phage can be plaque purified if reagents are 
available for detection. 



20 



Template Preparation and ^ i ^^ 

Templates are prepared for sequencing by inoculating 
a 1 ml culture of 2XYT containing a 1:100 dilution of an 

25 overnight culture of XL1 with an individual plaque from the 
purified population. The plaques are picked using a 
sterile toothpick. The culture is incubated at 3 7 «c for 5- 
6 hours with shaking and then transferred to a l 5 ml 
microfuge tube. 200 M l of PEG solution is added, followed 

30 by vortexing and placed on ice for 10 minutes. The phage 
precipitate is recovered by centrifugation in a microfuge 
at 12,000 x g for 5 minutes. The supernatant is discarded 
and the pellet is resuspended in 230 nl of TE (10 mM Tris- 
HC1, P H 7.5, l mM EDTA) by gently pipeting with a yellow 
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10 



Pipet tip. Phenol < 20 0 „1) is added, followed by a brief 
vortex and microfuged to separate the phases. The aqueous 
Phase „ transferred to a separate tube and extracted „™ 

00 ,1 of phenol/chloroform as described 

the phenol extraction. A 0.1 volume o, 3 M NaOAc is added 
folowe by addition of 2 . 5 volumes of ethanol 
prec pated at -ao-c for 20 minutes. The predated 
templates are recovered by centrifugation in a microfuge at 
12 000 x g for 8 minutes. The pellet is washed i„ 70% 

reSUSpend ,: d in 25 » «• sequencing was 
performed u S1 ng a Sequenase" sequencing kit following the 
protocol supplied by the manufacturer (U.S. Biochemical 
Cleveland, OH) . ' 



EXAMPT.b T7 
of Heavy and Light Ph»i r c- T .- nrr _ 
Without Restricts r n n ^ n^pc 1n 

anti^ 15 ." 1516 Sh ° VS ^ SimUlt — incorporation of 
antibody heavy and Ught chain fragment encoding sequences 

20 I 3 1 M13IXHL ' tyPe VeCt - ^ the use of restriction 

20 endonucleases. 

For the simultaneous incorporation of heavy and light 

a M i n 3iT din9 ; eqUenCeS int ° 8 Si " 9le =^"«ion vector-, 
a M13IXHL vector was produced that contained heavy and 

25 lit, enC ° din9 Sequences f « * ™>use monoclonal 

25 „t t bo dy (DAM - 18H4; 8iositc> san ^ ^ The inser J 

antibody fragment sequences are used as complementary 
sequences for the hybridization and incorporation df „ c and 
Zj*r n lV ^ Site - di " C " d ""agenesis. The genes 

30 nsell ^ ^ ll9ht Chai " Propeptides were 

inserted xnto M13XX30 (SEQ ID „0: 1, and M13IXU (SEQ ID 

NO: 2). respectively, and combined into a single surface 
M13IXHL-type vector is termed M13IX50. 
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The combinations were performed under conditions that 
facilitate the formation of one He and one Lc vector half 
into a single circularized vector. Briefly, the overhangs 
generated between the pairs of restriction sites after 
restriction with Mlu I or Hind in and exonuclease 
digestion are unequal (i.e., 64 nucleotides compared to 32 
nucleotides) . These unequal lengths result in differential 
hybridization temperatures for specific annealing of the 
complementary ends from each vector. The specific 
hybridization of each end of each vector half was 
accomplished by first annealing at 65'C in a small volume 
(about 100 ng/m) to form a dimer of one He vector half and 
one Lc vector half.. The dimers were circularized by 
diluting the mixture (to about 20 M g/„i) and lowering the 
temperature to about 25-37'C to allow annealing. T4 ligas 
was present to covalently close the circular vectors. 

M13IX50 was modified such that it did not produce a< 
functional polypeptide for the DAN monoclonal antibody. TO- 
do this, about eight amino acids were changed within the 
variable region of each chain by mutagenesis. The Lcr 
variable region was mutagenized using the oligonucleotide 
5 ' -CTGAACCTGTCTGGGACCACAGTTGATGCTATAGGATCAGATCTAGAATTCATT- 
TAGAGACTGGCCTGGCTTCTGC-3 • (SEQ ID NO: 68) . The He sequence 
was mutagenized with the oligonucleotide 

TCGACCGTTGGTAGGAATAATGCAATTAA T G; 
GAGTAGCTCTAAATTCAG AATTCATCTACACCCAGTG CATCCAGTAG CT- 3 • (SEQ: 
ID NO: 69). An additional mutation was also introduced' 
into M13IX50 to yield the final form of the vector. During 
construction of an intermediate to M13IX50 (M13IX04 
30 described in Example I), a six nucleotide sequence was 
duplicated in oligonucleotide 027 and its complement 032 
This sequence, 5 •TTACCG-3 ' was deleted by mutagenesis using 
the oligonucleotide 5 • -GGTAAACAGTAACGGTAAGAGTGCCAG-3 • (SEQ 
ID NO: 70). The resultant vector was designated M13IX?3. 



20 



25 



35 



M13IX53 can be produced as a single stranded form and 
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contains all the functional elements of the previously 
described M13IXHL vector except that it does not express 
functional antibody heteromers. The single-stranded vector 
can be hybridized to populations of single-stranded He and 
5 Lc encoding sequences for their incorporation into the 
vector by mutagenesis. Populations of single-stranded He 
and Lc encoding sequences can be produced by one skilled in 
the art from the PCR products described in Example I or by 
other methods known to one skilled in the art using the 
10 primers and teachings described therein. The resultant 
vectors with He and Le encoding sequences randomly 
incorporated are propagated and screened for desired 
binding specificities as described in Example I. 

Other vectors similar to M13IX53 and the vectors it's 
15 derived from, M13IX11 and M13IX30, have also been produced 
for the incorporation of He and Lc encoding sequences 
without restriction, in contrast to M13IX53, these vectors 
contain human antibody sequences for the efficient 
hybridization and incorporation of populations of human He 
20 and Lc sequences. These vectors are briefly described 
below. The starting vectors were either the He vector 
(M13IX30) or the Le vector (M13IX11) previously described. 

M13IX32 was generated from M13IX30 by removing the six 
nucleotide redundant sequence 5 • -TTACCG-3 • described above 
and mutation of the leader sequence to increase secretion 
of the product. The oligonucleotide used to remove the 
redundant sequence is the same as that given above. The 
mutation in the leader sequence was generated using the 
oligonucleotide 5 ■GGGCTT"TTGCCACAGGGGT-3 • . This mutagenesis 
resulted in the A residue at position 6353 of M13IX30 being 
changed to a G residue. 

A decapeptide tag for affinity purification of 
antibody fragments was incorporated in the proper reading 
frame at the carboxy- terminal end of the He expression site 
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in H13IX32. The oligonucleotide used for this genesis 

CTAG-V ^SEQ ^jp^^^^^^^^^^^^^^^^^^'TAGGATCCA 
ciag 3 (SEQ ID NO: 71). The rp^it,^ 
^ ' Ane re suitant vector va« 

designated M13TX33. Modifications to this or other vecjrs 

3 " :7 S10ned WhiCh include matures too™ to one 

sailed in the art. For example, a peptidase cleavage site 
can be incorporated following the decapeptide tag „ hic h 
allows the antibody to be cleaved from the s 
portion of the fusion protein. 

10 M13IX34 ( SE Q „ „o: „ was created from „ 13 iX33 by 

cloning in the gene encoding a human IgG! heavy chain. The 
reading frame of the variable region was changed and a stop 
codon was introduced to ensure that a functional 
Polypeptide would not be produced. The oligonucleotide 

15 used for the mutagenesis of the variable region was £ 
CACCGGTTCGGGGAATTAGTCTTGA'CCAGGCAGCCCAGGGC-3 ' (SEQ ID NO- 
72). The complete nucleotide sequence of this vector is 
shown m Figure 4 (SEO ID NO: 3). 

Several vectors of the miuyh «. 

W13IX11 series were 

20 generated to contain similar modifications a7 t„H 
described for the vectors M13IX53 and M13IX34. The 
promoter region in M13IX11 was mutated to conform to the '35 
consensus sequence to generate M13IX12. The 
oixgonucleotide used for this mutagenesis was S'-ATTCCACAC 

25 ATTATACGAGCCGGAAGCATAAAGTGTCAAGCCTGGGGTGCC-3 1 <SEQ ID N^ 

kaPPa li9ht chain "quenoe was cloned int. 
H13IX12 and the variable region subsequently deleted to 
generate (SEQ ID N0: 4) . The CMplete nucleotidfi 

"4). a similar vector, designated M13IXH, was also 
generated in which the human lambda light chain was 
inserted into M13IX12 followed by deletion of the variable 
region. The oligonucleotides used for the variable region 
deletion of H13IX13 and M13IX14 were s'-CTG 

35 CTCATCAGATGGCGGGAAGAGCTCGGCCATGGCTGGTTG-3. (SEQrDNO- 74, 
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and -GAACAGAGT GACCGAGGGGGCGAGCTCGGCCATGGCTGGTTG-3 * 
ID NO: 75), respectively. 



(SEQ 



c-h , " VeCt ° rS " BO<lified f0ras the "° f be 

combined using the methods described in " £ 

5 produce . single vector similar to H131x53 that aUws £ 
efficient incorporation of hu.an He and Lc encodino 
sequences by mutagenesis. An exMple of , 
the cognation of M13IX13 with „ 13IX34 . The 
nucleot.de sequence of this vector, Hl 3I x 6 o, is sno £ 
10 Figure 6 (SEQ ID NO: 5) . " 

de = A ^" i0nal ^i««tions to any of the previously 
described vectors can also be perf ormed to generate vectors 
which ailow the efficient incorporation and surface 

tlTt d ° f uracil selection a ' ain « "^type 

template during mutagenesis procedures, the variable regL 
locations within the vectors can be substituted by a set of 
pa indromic restriction enzyme sites (i.e., J a simil ^ 
sites in opposite orientation, . The palindromic sites will 

IZ ? hytridi " t096th " ^ «» mutagenesis and 
thus form a double-«?t-^«H«^ ^ ^ 

u 0 uoj.e stranded substrate for restriction 

endonuclease digestion. cleavage of the site results n 

the destruction of the wild-type template. The variable 

region of the inserted „ c or L= sequences win not be 

*S affected since they will be in single stranded form 

or ™ l0 " iT " the » ethods «* =«m P le !. single-stranded H= 
r " n ^ Pr ° dUCed by a Variet * ° f -thod. 

TrZr d r UlGd ^ art - F ° r eXa ^ le > «» 

0 pcHo " EXaBPle 1 bS U " d in metric 

PCR to generate such populations. Gelfand et al., "PCR 

Protocols: A Guide to Methods and Applications", Ed by 
M.A. Innis (1990) U hirk ,• - • 

1 U) ' whlch ls incorporated herein by 
reference. Asymmetric pgr i e 

* 1C PCR ls a PCR method that 

differentially amplifier rmi« = i 

/ «pxiries only a single strand of the double 
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stranded template, such differential amplif ication u 
accomplished by decreasing the primer amount for the 
undesirable strand about 10-fold compared to that for the 
desirable "rand. Alternatively, singie-stranded 

5 populations can be produced from double-stranded P^ 
products generated as described in Example I except that 
the pnmer ( s, used to generate undes . rabie J 

the double-stranded products is first phosphorylated at its 
5. end with a xinase. The resultant products can then be 
10 treated with a 5- to 3- exonuclease, such as iambda 
exonuclease (BR1 , Bethesda, MD) to digest away the unwanted 

strand . 

Single-stranded He and I* populations generated by the 
methods described above or by others Known to one skilled 

15 in the art are hybridized to complementary sequences 
encoded in the previously described vec7ors. ™" 
population of the sequences are subsequently incorporated 
into a double-stranded form of the vector by polymerase 
extension of the hybridized templates. Propagation and 

20 surface expression of the randomly combined He and Lc 
sequences are performed as described in Example I, 

Although the invention has been described with 
reference to the presently preferred embodiment, it should 
be understood that various modifications can be made 
5 without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: HUSE, WILLIAM D. 

m &£ B SS5FlS&ggp a v™"™ ™™™ of 

(iii) NUMBER OF SEQUENCES: 75 
(Iv) CORRESPONDENCE ADDRESS • 

ti\ ^f??f^ E / E ;/ P ^ TT L' SCHR0ED ER. BRUEGCEMANN & CLARK 
)il 5!S£ ET: 444 S0 - FLOWER STREET, SUITE 200 

(C) CITY: LOS ANGELES 

(D) STATE: CALIFORNIA 

(E) COUNTRY: UNITED STATES 

(F) ZIP: 90071 

(v) COMPUTER READABLE FORM- 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM : PC-DOS/MS-DOS 

(D) SOFTWARE: Patencln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

" (C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION' 

(A) NAME: CAMPBELL, CATHRYN A. 

(B) REGISTRATION NUMBER: 31 815 

(C) REFERENCE/DOCKET NUMBER: P31 8882 

(ix) TELECOMMUNICATION INFORMATION- 

(A) TELEPHONE: 619-535-9001 

(B) TELEFAX: 619-535-8949 

(2) INFORMATION FOR SEQ ID NO:l; 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AATCCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCCCCCC AAATCAAAAT 60 

ATACCTAAAC ACCTTATTCA CCATTTCCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CCTTCCCACA ATTGCGAATC AACTCTTACA TGCAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATCT TGAGCTACAG CACCAGATTC AGCAATTAAC CTCTAAGCCA 240 

TCTCCAAAAA TCACCTCTTA TCAAAAGCAG CAATTAAAGG TACTCTCTAA TCCTGACCTC 300 

TTGGAGTTTC CTTCCGGTCT GGTTCCCTTT GAAGCTCGAA TTAAAACGCC ATATTTGAAC 360 

TCTTTCCGCC TTCCTCTTAA TCTTTTTCAT GCAATCCCCT TTCCTTCTCA CTATAATACT C2C 



480 
540 
600 
660 
720 
780 
840 
900 
960 
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CACCCTAAAG ACCTGATTTT TCATTTATCG TCATTCTCCT TTTCTGAACT GTTTAAAGCA 
TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 
AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 
AATTCCTTTT GGCGTTATGT ATCTCCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 
ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTACATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 
CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTCGTGTTT 
CTCGTCAGGC CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 

AATATCCGCT TCTTGTCAAC ATTACTCTTG ATGAAGCTCA CCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATCA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTCGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCCT TTTAGGTTCG TGCCTTCCTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCCATGCTG TCTTTCGCTG CTGAGGGTGA • 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA CCGACCGAAT ATATCCGTTA " 1440 

TGCGTGGGCG ATCCTTCTTC TCATTGTCGC CGCAACTATC CCTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAACCT CATAAACCCA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTCT 1740 

CTCTCCAATG CTACAGGCGT TGTAGTTTCT ACTGGTGACG AAACTCACTC TTACGCTACA 1800 

TGGGTTCCTA TTGGGCTTCC TATCCCTGAA AATCAGGGTC GTGGCTCTGA GGGTGGCGGT I860 

TCTGACCGTC GCGCTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TCATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTC TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA . 2220 

GATCCATTCG TTTGTGAATA TCAACCCCAA TCGTCTGACC TCCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCC GCTCTGGTGG TGGTTCTCGT GGCCGCTCTC AGGGTGGTGG CTCTGAGGGT 2340 

CCCCCTTCTC AGGCTGGCCG CTCTGAGCGA CCCCCTTCCC GTCGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACCCT AATAACCCCC CTATGACCCA AAATGCCGAT 2MC 



WO 92/06204 

PCT/LS9I/07I49 



2520 
2580 



44 

CAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCCCTAC TGATTACCGT 
GCTCCTATCG ATCGTTTCAT TCCTCACCTT TCCGCCCTTG CTAATGGTAA TGCTGCTACT 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACCCTCA TAATTCACCT 2640 
TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCCCCCT 2700 
TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 
TTCCCTGCTC TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 
TATTATTGCG TTTCCTCGCT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
TTAAAAACGC CTTCGGTAAG ATAGCTATTG CTATTTCATT CTTTCTTGCT CTTATTATTG 3000 
GGCTTAACTC AATTCTTCTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACCTTAAACA AAAAATCGTT TCTTATTTGG 3180 
ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 
CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTC GGTGCAAAAT AGCAACTAAT 3300 
CTTCATTTAA GGCTTCAAAA CCTCCCCCAA CTCCCCACCT TCGCTAAAAC GCCTCGCGTT 3360 
CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTCCTTC CTATTGGGCG CGCTAATGAT 3420 
TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCCATC AGTGCGGTAC TTGGTTTAAT 3480 
ACCCCTTCTT GGAATGATAA GGAAACACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTCA TAAACAGCCG 3600 
CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCCTC TGGACAGAAT TACTTTACCT 3660 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATCG CGATTCTCAA TTAAGCCCTA CTGTTCAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTCTATAA CCCATATCAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGCTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCTTGCCA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAACCCC 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATa' AATTCACTAT TGACTCTTCT 4080 

CAGCCTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCCACCATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TCAAATTCTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TCCGCCATTT 4320 

TCTAACTTGG TATTCAAAGC AATCAGGCGA ATCCCTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT CTATATTCAT CTGACCTTAA ACCTGAAAAT CTACCCAATT TCTTTATTTC 4440 

TGTTTTACCT GCTAATAATT TTCATATGGT TGCTTCAATT CCTTCCATAA TTCAGAACTA 4SQC 
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TAATCCAAAC AATCAGGATT ATATTGATCA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACGTTC GGGCAAACGA TTTAATACGA GTTGTCCAAT TGTTTGTAAA 
GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTCAC GGCTCTAATC TATTAGTTGT 
TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTCACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTTTCATTT GCTGCTGGCT CTCAGCGTCG CACTCTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCCCT ATTTTTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATC TCCCTTTTAT 
TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 
TCAAAATGTA GCTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG CTAATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA CTGATGTTAT 
TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGCACAGA CTCTTTTACT 
CGCTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCCT TCCTGTCTAA 
AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACCAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CCCCCTGTAG CGGCCCAITA AGCGCGGCGG 
GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCC CCCGCTCCTT 
TCGCTTTCTT CCCrTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACCGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTCTTCCA AACTGGAACA AGACTCAACC 
CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCCCAA CCACCATCAA 
ACAGGATTTT CGCCTGCTGC GGCAAACCAG CCTGGACCGC TTCCTCCAAC TCTCTCAGGG 
CCAGGCGGTG AAGCGCAATC ACCTCTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 
ACCACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GCCACCCCAG CCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCCTT TTACAACGTC 
GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTCTACAT GGAGAAAATA AAGTGAAACA 
AAGCACTATT GCACTCGCAC TCTTACCGTT ACCGTTACTG TTTACCCCTG TCACAAAAGC 
CGCCCAGGTC CAGCTGCTCG AGTCACCCCT ATTCTGCCCA GGGGATTGTA CTACTGGATC 
CTAGGCTCAA GGCGATCACC CTGCTAAGGC TCCATTCAAT AGTTTACAGG CAAGTCCTAC 
TGAGTACATT GGCTACGCTT CCGCTATGCT AGTACTTATA CT7CCTCC7A CCATACCCAT 
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TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA ATACCCAAGA GGCCCGCACC 
CATCCCCCTT CCCAACAGTT CCGCAGCCTC AATGGCCAAT CGCGCTTTGC CTGGTTTCCG 
GCACCAGAAG CGGTGCCGGA AAGCTGGCTG GAGTGCGATC TTCCTGAGGC CCATACCCTC 
GTCGTCCCCT CAAACTGGCA GATCCACGGT TACGATGCGC CCATCTACAC CAACGTAACC 
TATCCCATTA CGGTCAATCC GCCGTTTGTT CCCACGGAGA ATCCGACGGG TTCTTACTCG 
CTCACATTTA ATGTTGATGA AAGCTCGCTA CACGAAGGCC ACACCCCAAT TATTTTTGAT 
GGCGTTCCTA TTGGTTAAAA AATGAGCTCA TTTAACAAAA ATTTAACGCG AATTTTAACA 
AAATATTAAC GTTTACAATT TAAATATTTG CTTATACAAT CTTCGTCTTT TTGGGGCTTT 
TCTGATTATC AACCGGGGTA CATATCATTC ACATGCTAGT TTTACGATTA CCCTTCATCG 
ATTCTCTTGT TTGCTCCAGA CTCTCAGGCA ATGACCTGAT AGCCTTTCTA CATCTCTCAA 
AAATAGCTAC CCTCTCCGGC ATTAATTTAT CAGCTAGAAC GGTTCAATAT CATATTGATG 
GTGATTTGAC TGTCTCCGGC CTTTCTCACC CTTTTGAATC TTTACCTACA CATTACTCAG 
GCATTGCATT TAAAATATAT GAGGGTTCTA AAAATTTTTA TCCTTGCGTT GAAATAAAGG 
CTTCTCCCGC AAAAGTATTA CAGGGTCATA ATGTTTTTGG TACAACCGAT TTAGCTTTAT 
GCTCTGAGGC TTTATTGCTT AATTTTCCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 
ACGTT 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 7317 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
AATCCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 
ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATCTATCTA ATGCTCAAAC TAAATCTACT 
CGTTCCCAGA ATTGCGAATC AACTGTTACA TGGAATCAAA CTTCCAGACA CCGTACTTTA 
CTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 
TCCGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTC 
TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 
TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 
CAGGCTAAAG ACCTCATTTT TGATTTATCG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 
TTTCACCGCC ATTCAATGAA TATTTATGAC CATTCCGCAC TATTGGACCC TATCCAGTCT 
AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAACCCTC TCCCTATTTT 
CGTTTTTATC CTCGTCTGCT AAACGAGGCT TATGATAGTG TTCCTCTTAC TATGCCTCGT 
AATTCCTTTT GGCCTTATCT ATCTCCATTA GTTCAATGTG GTATTCC7AA ATCTCAACTC 
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ATCAATCTTT CTACCTCTAA TAATCTTCTT GCGTTAGTTG G7TTTATTAA CCTACATTTT 
TCTTCCCAAC GTCCTCAGTG GTATAATCAC CCACTTCTTA AAATCCCATA AGCTAATTCA 
CAATCATTAA ACTTGAAATT AAACCATCTC MCCCCAATT IACTACTCGT TCTGCTGTTT 
CTCGTCAGCG CAAGCCTTAT TCACTGMTG AGCAGCTTTC TTACGTTCAT TTGGGTAATC 
AATATCGCGT TCTTGTCAAG ATTACTCTTG AXGAAGGTCA GGGAGGCTAT GCGCCTGGTC 
TGTAGAGGGT TCATCTGTCC TGITTGAAAG TTGGTCAGTT CGCTTCCCTT ATGATTGAGC 
GTGIGGGCGT GGTTGGGGGT AAGTAACATG GAGCAGGIGG CGGATTTGCA CACAATTTAT 
GAGGGGATGA TACAAATCTC GCTTGTACT TCTTTCCCCC TTGCTATAAT CGGTGGGGGT 
CAAAGATGAG TGTTTTAGTG TATTCTTTCC CCTCTTTCGT TTTAGGTTGG TGCCTTCCTA 
GTGGCAITAC GIATTTTAGG GGTTTAATGG AAACTTGGTC ATGAAAAAGT CTTTAGTCCT 
CAAACXTCT GTAGCCGTTG CTAGGGTGGT TCGCATGGTG TCTCTCGCTG GTGAGGGTGA 
CGATCCCCCA AAAGCGGCGT TTAACTCCCT GGAAGCGIGA GGGACGGAAI ATAIGGGTTA 
TGGGTGGGCG AIGGTOTTG TCATTGTCCC CCCAACTATC GGIATGAAGC TCTTTAACAA 
ATTCACCTCG AAAGCAAGGI GAIAAACGGA TACAATTAAA GCCTCCTTTT GGAGCCTTTT 
TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTGGTTTC 
TATTCTCACT CCCCTGAAAC TGTTGAAAGT TGTTTAGGAA AACCCCATAC ACAAAATTCA 
TTTACTAACG TGTGGAAAGA CGACAAAACT TTACATCGTT AGGCTAACTA TGAGGGTTGT 
CTCTGGAATG CTAGAGGCGT TCTAGTTTGT ACTGGTGACC AAACTCAGTC TTACGCTACA 
TGCGTTCCTA TTGGGCTTGG TATGCCTGAA AATGAGGGTG OIGGGTGTGA GCGTGGCGCT 
TGIGAGGGTG GGGGTTCTGA GGGTGGGGGT ACTAAACCTC CTGAGTAGGG TCATACACCT 
ATTCCGGGGT ATACTTATAT CAAGGGICTG GACGGCACTT ATCCGCCTGC TAGTGACGAA 
AACCCCCCTA ATCCTAATCC TTCTCTTGAG GAGTCTGAGG CTCTTAATAC TTTCATCTTT 
CAGAATAATA GGTTGGGMA TAGGCAGGGG GGATTAAGIG TTTATACGGG GACTGTTACT 
GAAGGCACTG ACCCCGTTAA AACTTATTAC CACTACACTG GTGTAICATG AAAACCCATC 
TATGACCCTT ACIGGAACGG TAAATTGAGA CACTCCGCTT TCCATTCTGC CTTTAATCAA 
GATCGATICG TTTGIGAAIA ICAAGGCGAA TCCTCTGACC TCCCTCAACC TCCTGTCAAT 
GCTCGCGGCG' GCTCTGGTGG TGGTTCTGGT GGGGGGTGTG AGCGTGGTGC CTCTCAGGCT 
GGGGGTTCTG AGGGTGGGGG GTGIGAGGGA GGCGCTTCCG GTGGIGGGTG TGGTTGGGGT 
CATTTTGATT ATGAAAAGAT GGGAAAGGGT AATAAGGGCG CTATCACCCA AAATGCCGAT 
GAAAACGGGG TACACTCTGA CGCTAAACCC AAACTTGATT CTGICGCTAC TGATTAGGGT 
GGTGCTATCG ATCGTTTCAT TGGTGACGTT TGCGGGCTTG CTAAIGGTAA TGGTGGTAGT 258C 
CCTCATTTTG GTGGGTGTAA TTCCCAAATC GCICMGTGC CTCACCGTGA TAATTCACCT 26iC 
TTAATGAATA ATTTCCGTCA ATATTTACCT TGGCTGCCTG AATCGGTTGA ATGTCCCCCT 270 C 
TTTGTGTTTA CCCCTCCTAA ACCMATCAA 7TTTCTATTG ATT5TCACAA MT-AACT-, ~„ 
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TTCCCTCCTC 
TTTGCTAACA 
TATTATTGCG 
TTAAAAAGGG 
CCCTTAACTC 
TTGTTCAGGG 
TCTCTGTAAA 
ATTGGGATAA 
CTCCTTAGCC 
CTTGATTTAA 
CTTAGAATAC 
TCCTACGATG 
ACCCGTTCTT 
AAATTAGGAT 
CGTTCTGCAT 
TTTGTCGGTA 
GTTGGCGTTG 
ACTGGTAAGA 
TCCGCTCTTT 
AATTTAGGTC 
TGTCTTGCGA 
GAGGTTAAAA 
CAGCGTCTTA 
AGCGACCATT 
ATTAAAAAAG 
TGTTTCATCA 
TGTAACTTGG 
TACTGTTACT 
TCTTTTACCT 
TAATCCAAAC 
TGATAATTCC 
TTTTAAAATT 
GTCTAATACT 
TACTCCACCT 



TCTTTGCGTT 
TACTGCCTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
TGTTCAGTTA 
GGCTGCTATT 
ATAATATGGC 
TTGGTAAGAT 
GCCTTCAAAA 
CGGATAAGCC 
AAAATAAAAA 
GGAATGATAA 
GGGATATTAT 
TAGCTGAACA 
CTTTATATTC 
TTAAATATGG 
ATTTGTATAA 
ATTCTTATTT 
AGAACATCAA 
TTGGATTTCC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAACGTTC 
TCTAAATCCT 
AAAGATATTT 



TCTTTTATAT 
TAACGAGTCT 
TTCCTTCTGG 
ATAGCTATTG 
GGTTATCTCT 
ATTCTCCCCT 
TTCATTTTTG 
TCTTTATTTT 
TCAGGATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCTT 
CGAAACACAC 
TTTTCTTGTT 
TGTTCTTTAT 
TCTTATTACT 
CGATTCTCAA 
CGCATATGAT 
AACGCCTTAT 
GCTTACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATCTT 
AGGTTATTCA 
TGAAATTCTT 
CTCACCTAAT 
AATCAGGCGA 
CTGACGTTAA 
TTGATATGGT 
ATATTGATGA 
GTCGTTTCTT 
GGGCAAAGCA 
CAAATGTATT 
TAGATAACCT 



48 

CTTCCCACCT 
TAATCATGCC 
TAACTTTGTT 
CTATTTCATT 
CTGATATTAG 
CTAATGCGCT 
ACGTTAAACA 
GTAACTGGCA 
ATTGTAGCTG 
GTCGGGAGGT 
CATTTGCTTC 
GTTCTCGATG 
CCGATTATTG 
CAGGACTTAT 
TGTCGTCGTC 
CGCTCCAAAA 
TTAAGCCCTA 
ACTAAACAGG 
TTATCACACG 
ATATATTTGA 
ACATATAGTT 
GATTTTGATA 
TTCAAGGATT 
CTCACATATA 
AAATGTAATT 
TGAAATGAAT 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATACGA 
ATCTATTGAC 
TCCTCAATTC 



TTATGTATGT 
ACTTCTTTTC 
CGGCTATCTC 
GTTTCTTGCT 
CGCTCAATTA 
TCCCTGTTTT 
AAAAATCGTT 
AATTAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGTGCGGTAC 
ATTCCTTTCT 
CTATTGTTGA 
TGGACAGAAT 
TGCCTCTGCC 
CTGTTGACCC 
CTTTTTCTAG 
GTCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAGGGAAA 
TTGATTTATG 
AATTTTGTTT 
AATTCGCCTC 
CTTTCTCCCG 
CTACGCAATT 
CCTTCCATAA 
TCTGATAATC 
AATGATAATC 
GTTGTCGAAT 
GGCTCTAATC 
CTTTCTACTG 



ATTTT CTACC 
GCTATTCCGT 
CTTACTTTTC 
CTTATTATTG 
CCCTCTCACT 
TATGTTATTC 
TCTTATTTGG 
TCGAAAGACC 
ACCAACTAAT 
GCCTCGCGTT 
CGGTAATGAT 
TTGGTTTAAT 
ACATGCTCGT 
TAAACAGGCG 
TACTTTACCT 
TAAATTACAT 
TTGCCTTTAT 
TAATTATGAT 
CAAACCATTA 
ACCCGTTCTT 
ACCTAAGCCG 
TGACTCTTCT 
ATTAATTAAT 
TACTCTTTCC 
TCTTGATGTT 
TGCGCGATTT 
ATGTAAAAGG 
TCTTTATTTC 
TTCAGAAGTA 
AGGAATATGA 
TTACTCAAAC 
TCTTTGTAAA 
TATTAGTTGT 
TTGATTTCCC 
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"GTGAGGAG ATATTCATTC AGGGHTGAT AmCACC „ ^ ATCCT7TACA 
TTTTTCAnr GGTGGTGGGT GTGAGGGTGG CACTCTTCCA GGGGGTGTTA ATAGTGAGGG 
CCTCACCTCT GTTTTATCTT GTGGTGGTGG TTCGTTCCCT AnTTTAATC GGGATGrm 
AGGGGTATGA OT C=CCCAT TAAACACTAA TAGCCATTCA AAAATATTCT CTGTCGCAGG 
TATTCTTACC GTTTCAGGTG AGAAGGGTTC TATCTCTGTT GGGGAGAATG TCCCTTTTAT 
TACTCCTCCT GTGACTGGTG AATCTCCCAA TGTAAATAAT CCATTTCAGA CGATTCACCG 
TCAAAATCTA CGTATTTGCA TGAGCGTTTT TCCTCTTCCA ATGGCTGGGG GTAATATTCT 
TCTCGATATT AGGACGAAGG CCGATAGTTT GACTTCTTCT ACTCAGCCAA CTCATGTTAT 
TAGTAATGAA AGAAGTATTG CTACAACCCT TAATTTGGGT GATGGACACA GTGTTTTACT 
CGGTGGGGTG ACTGATTATA AAAACACTTC TCAAGATTCT CGCGTACCGT TCCTCTCTAA 
AATCCCITTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TGGAACGAGG AAAGGACGTT 
ATACGTCCTC CTCAAAGCAA CCATAGTACC CGGGGTGTAG GGGGGGATTA AGCGCGGCGC 
GTGTGGTGGT TAGGGGGAGG GTCAGGGGTA GAGTTGGGAG GGGGGTAGGG CCCGCTCCTT 
TCCCTTTCTT GGGTTGGHT GTCGGGAGGT TCGCCGCCTT TCCCCGTCAA GCTCTAAATC 
GGGGCGTGCG TTTAGGG7TC CCATTTAGTG CTTTACGGCA GGTGGAGGGC AAAAAACTTC 
ATTTGGGTGA TGGTTGAGGT AGTGGGGGAT CGCCCTGATA CACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCT7T AATAGTGGAC TCTTGTTCCA AAGTGGAAGA ACACTCAACC 
GTAXCTGGGG G.ATTGT^ CATTTATAAC GCATTTTGCC GATTTCGGAA CCACCATCAA 
AGAGGATTTT GGGGTGCTGG CGCAAACGAG GGTGGAGGGG TTCCTGCAAC TCTCTCAGGG 
GGAGGGGGTG AAGGGGAATC AGCTCTTGCC GGTGTGGGTG GTGAAAACAA AAAGCACCCT 
GGCGGGGAAX AGGGAAAGGG GGTGTGGCGG CCCGTTGGCC GATTCATTAA TGGAGGTCGC 
ACGAGAGGTT TCCCCACTCG AAACGGGGGA CTGAGCGCAA CGCAATTAAT CTCAGTTAGC 
VHCCCCK GCtTIACACr TTATCCTTCC GCCTCGTATC TTCTGTGCAA 
TTGTGAGGGG ATAACAATTT CACACGCCAA GGAGAGAGTG ATAATGAAAT ACCTATTGCC 
TAGGGGAGGG CCTGGATTGT TATTACTCGC TGCCCAACCA GGGATGGGGG AGCTCCTCAT 
GAGGGAGAGT CCAGATATCC AAGAGGAATG AGTGTTAATT GTAGAAGGGG TCACTTGGCA 
CTGCCCCTCC TTCTACAACG TGGIGACTGG GAAAACGGIG GGGTIAGGGA ACCTTAATCC 
CCTTCCAGAA rTCCCTTTCG CCACCTCGCC TAATAGCGAA GAGGGGGGGA GGGATGGGCC 
TTGGGAAGAG TTGGGGAGGG TGAATGGGGA ATGGCGCTTT CCCTGCTTTC CGGGAGGAGA 
AGGGGTGGGG CAAAGCTGGC TGGAGTGCCA TGTTGGTGAG CCCCATACGG TCGTCCTCCC 
CTCAAACTGG GAGATGGAGG GTTAGGAIGC GCGCATGTAG ACCAACCTAA CCTATCC- 
TACGGTGAAT CCGCCCTTTC TTGGCAGGGA GAATGGGAGG GGTTGTTACT CCCTCACA"T 
TAATGTTGAT GAAAGGTGGC XAGAGGAAGG CCAGACGGGA ATTATTTTTG ATGCCCTTCC 678 „ 
:ATTGGTTAA aaaatgagct CATTTAACAA a«TTTAACC gg«~-, > r^->-. 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 7729 base pairs 
B) TYPE: nucleic acid 

(C) STRANDEDNESS : boch 

(D) TOPOLOGY : circular 



ATCTTCCTGT 


TTTTGGGGCT 


TTTCTCATTA 


0 7UU 


GTTTTACGAT 


TACCCTTCAT 


CCATTCTCTT 




ATAGCCTTTG 


TAGATCTCTC 


AAAAATAGCT 




ACGGT7GAAT 


ATCATATTGA 


TCGTGATTTG 




TCTTTACCTA 


CACATTACTC 


AGGCATTGCA 


7140 


TATCCTTGCG 


TTGAAATAAA 


GGCTTCTCCC 


7200 


GGTACAACCG 


ATTTAGCTTT 


ATGCTCTGAG 


7260 


CCTTGCCTGT 


ATGATTTATT 


CGATCTT 


7317 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3- 
AATCCTACTA CTATOCIAC AATTCATCCC ACCTTTTCAC CTCOCCCCCC AAATCAAAAT 
ATACCTAAAC AGGTTATTGA CCATTTGCCA AATCTATCTA ATCGTCAAAC TAAATCTACT 
CGTTCGCAGA ATTGCGAATC AACICTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 
CTTGGATATT TAAAACATGT TGAGCTACAG CACCACATTC AGCAATTAAG CTCTAAGGCA 
TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGC TACTCTCTAA TCCTGACCTG 
TTGGACTTTG CTTCGGGTCT CCTTCCCTTT GAACGTGGAA TTAAAAGGGG ATATTTCAAG 
TCTTTCGGGC TTCCTCTTAA TCTTTTTCAT GCAATGCGCT TTGCTTCTCA CTATAATAGT 
CAGGGTAAAG ACCTGATTTT TCATTTATGC TCATTCTCCT TTTCTGAACT GTTTAAAGGA 
TTTGAGGGGG ATTCAATCAA TATITATCAG CATTCCGCAG TATTGGACGC TATCCAGTCT 
AAACATTTTA CTATTACCCC CTCTGGCAAA AGTTCTTTTG CAAAAGCCTC TCGGTATTTT 
GGTTTTTATG GTCGTGTGGT AAAGGAGGGT IATGATAGTG TTCCTCTTAC TATGCCTCGT 
AATTCCTTTT GGCCTTATGT ATCTCCATTA GTtGAATGTG GTATTCCTAA ATGTCAAGTG 
ATGAATCTTT GTACGTGTAA TAATCTTCTT CCCTTAGTTC GTTTTATTAA CGTAGATTTT 
TCTTGCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATGGCATA AGGTAATTCA 
CAATGATTAA AGTTGAAATT AAACCATGTG AACCCCAATT IAGTACTGGT TGTGGTGTTT 
CTGGTGAGGG CAAGCCTTAT TCACTGAATG AGGAGGTTTG TTACGTTGAT TTGGGTAAIG 
AATATGCGGT TGTTGTGAAG ArTACTCTTC ATGAACCTCA GCGAGGGTAT GCGCCIGCTC 
TGTACACCGT TCATGTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCGGTT ATGATTGACC 
OTGTGCGCCT CGTTGCCCCT AAGTAAGATG GAGCAGGTCG CCCATT7CCA CACAA7TTA T 
CAGCCGATGA TACAAATCTC CCTTCTACTT TGTTTCCCGC TC 
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CAAAGATCAC TCTnTACTG TATTCTTTCC GCTCTTTGGT TTTAGCTTCC TCCCTTCCTA 
CTGGCATTAC GTATTTTAGG GGTTTAATGG AAAGTTGGTC ATCAAAAAGT CTTTACTCCT 
GAAAGGGTGT GTAGGGGTTG CTAGCCTCCT TCGCATGCTG TCTTTCGGTG CTGAGGGTGA 
CGATCGCCCA AAAGCGGGCT TTAACTCCCT GCAACCCTCA GCGAGGGAAT ATATCGGTTA 
TGGGTGGGGC ATCGTTGTTC TCATTGTCCC CGCAACTATC GCTATCAAGG TCTTTAAGAA 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCC7TTT GCAGCCTTTT 
TTTTTCCACA TTTTCAACCT GAAAAAATTA TTATTCCCAA TTCCTTTAGT TGTTCCTTTC 
TATTCTCACT CCGCTCAAAC TGTTGAAAGT TCTTTACCAA AACCCCATAC ACAAAATTCA 
TTTAGTAAGG TCTGGAAAGA CCACAAAACT TTACATCGTT ACGCTAACTA TGAGGGTTGT 
CTGTGCAATG CTACAGGCGT TCTAGTTTGT ACTGGIGAGG AAACTCACTC TTAGGGTAGA 
TCGGTTCCTA TTGGGCTTGC TATGGGTGAA AATGAGGGTG GTGGCTCTGA GGGTGGGGGT 
TGTGAGGGTG GCGGTTCTGA GGGTGGGGGT ACTAAACCTC GTGAGTACGG TGATAGAGCT 
ATTCCCGGCT ATACTTATAT GAACGCTCTG GACGGGAGTT ATGCGCGTGG TACTGAGGAA 
AAGGCGGCTA ATGCTAATGC TTCTCTTGAC CACTCTCAGC CTCTTAATAC TTTCATCTTT 
CAGAATAATA GGT7CGGAAA TACGCAGGGC GGATTAAGTG TTTATACGGC CACTGTTACT 
CAAGGCAGTG AGCGCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGGGATG 
TATCACCCTT ACTGCAACGG TAAATTCAGA CACTGGGCTT TCCATTCTGG CTTTAATGAA 
GATCGATTGG TTTCTCAATA TGAAGGCCAA TCGTCTGAGC TGCGTGAACC TCCTGTCAAT 
GGTGGCGGCG GCTCTGGTGG TGGTTGTGGT GGGGGCTCTG AGGGTGGTGG GTCTGAGGGT 
GGCGGTTCTG AGGGTGGGGG CTCTGAGGGA GGGGGTTGCG CTCGTCGCTC TGGTTGCGGT 
GATTTTGATT ATGAAAAGAT GCCAAACGCT AATAAGGGGG CTATGAGGGA AAATGGGGAT 
GAAAACGCGC TAGAGTCTGA CGCTAAAGGC AAACTTGATT CTCTCGCTAC TCATTACCCT 
CCTGCTATGG ATCGTTTCAT TGCTCACCTT TCCCGCCTTC CTAATGGTAA TGGTGGTACT 
CGTCATTTTG GTCGGTCTAA TTCCCAAATG GGTCAAGTCG GTGAGGGTGA TAATTCACCT 
TTAATGAATA ATTTCGGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGGCGT 
TTTGTCTTTA GCGGTGGTAA AGGATATGAA TTTTCTATTC ATTGTGACAA AATAAACTTA 
TTCCGTGGTG TCTTTGGGTT TCTTTTATAT GTTGGCAGGT TTATGTATCT ATTTTCTACG 
TTTCCTAACA TAGTGCGTAA TAAGCAGTGT TAATCATCCC AGTTCTTTTC CGTATTCCGT 
TATTATTCCC TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CCCCTATCTG CTTACTTTTC 
TTAAAAAGGG GTTCGGTAAG ATAGCTATTC CTATTTGATT CTTTCTTCCT CTTATTATTG 
GGGTTAAGTC AATTCTTCTC GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 
TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT GTAATGCGCT TCCCTCTTTT TATGTTATTC 
TCTCTGTAAA GGCTGCTAT7 TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGC 3U0 
ATTGGGATAA ATAATATCCC TCTTTATTT7 CTAACTCCCA ,,TT,GGCT: ?CG***G*~~ 
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CTCCTTAGCC TTCGTAAGAT TCAGGATAAA ATTGTAGCTC GGTGCAAAAT AGCAACTAAT 
CTTGATTTAA CGCTTCAAAA CCTCCCCCAA GTCGGGAGGT TCGCTAAAAC GCCTCCCCTT 
CTTAGAATAC CGCATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 
TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 
ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 
AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACACGCC 
CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 
GTTGGCGTTG TTAAATATCG CGATTCTCAA TTAAGCCCTA CTGTTGAGCC TTGGCTTTAT 
ACTGGTAAGA ATTTCTATAA CCCATATCAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTACC7C AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA AGCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAAC GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTCATGTT 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCCCGATTT 
TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCCTTATT GTTTCTCCCG ATCTAAAAGG 
TACTCTTACT CTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 
TGTTTTACGT CCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACCTTC GGCCAAACGA TTTAATACGA GTTCTCGAAT TGTTTGTAAA 
GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 
TAGTCCACCT AAAGATATTT TAGATAACCT TCCTCAAITC CTTTCTACTC TTGATTTGCC 
AACTCACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTTTCATTT GCTGCTGCCT CTCAGCGTGG CACTGTTGCA GGCGCTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTCCTGGTGG TTCGTTCCGT ATTTTTAATG CCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTCCCACC 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT ' 
TACTCGTCGT GTGACTCGTG AATCTCCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCC 
TCAAAATGTA CGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATCGC7GGCC CTAATATTCT 
TCTCCATATT ACCACCAAGG CCCATAGTTT GAGTTCTTCT ACTCACGCAA GTCATC77- 
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TACTAATCAA ACAACTATTG CTACAACCCT TAATTTCCGT GATCGACAGA CTCTTTTACT 
CCGTGCCCTC ACTCATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 
AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCCCTCTGAT TCCAACGAGG AAAGCACGTT 
ATACGTGCTC CTCAAAGCAA CCATAGTACG CGCCGTCTAG CGGCGCATTA AGCGCGGCGG 
GTGTGGTGGT TACCCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
CGCGGCTCCC TTTAGCGTTC CGATTTAGTC CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ATTTGCGTGA TGGTTCACGT ACTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTCGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTCGAACA ACACTCAACC 
CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCCC TTCCTCCAAC TCTCTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCCCTC GTCAAAAGAA AAACCACCCT 
GGCGCCCMT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTCGC 
ACGACAGGTT TCCCCACTGG AAAGCGGGCA CTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA CGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATC TTGTGTGCAA 
TTGTGAGCGG ATAACAATTT CACACGCGTC. ACTTGGCACT CGCCGTCGTT TTACAACGTC 
GTCACTGGGA AAACCCTCGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTCAAACA 
AAGCACTATT GCACTGCCAC TCTTACCGTT ACTGTTTACC CCTGTCGCAA AAGCCCAGGT 
CCAGCTGCTC CACTCCCTCT TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC 
AGCCGCCCTG GGCTCCCTCG TCAACACTAA TTCCCCGAAC CGGTGACGGT GTCGTGGAAC 
TCACGCGCCC TGACCACCGG CGTGCACACC TTCCCGGCTG TCCTACAGTC CTCAGGACTC 
TACTCCCTCA GCAGCGTGGT GACCGTGCCC TCCAGCAGCT TGGGCACCCA GACCTACATC 
TGCAACGTGA ATCACAAGCC CACCAACACC AACGTCGACA AGAAAGCAGA CCCCAAATCT 
TGTACTAGTC GATCCTACCC GTACGACGTT CCGGACTACG CTTCTTAGGC TGAAGGCGAT 
GACCCTGCTA AGGCTGCATT CAATAGTTTA CAGCCAAGTG CTACTGAGTA CATTGGCTAC 
GCTTGGGCTA TGGTAGTAGT TATAGTTGGT GCTACCATAG GGATTAAATT ATTCAAAAAG 
TTTACGAGCA AGGCTTCTTA AGCAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC 
AGTTGCGCAG CCTGAATGCC CAATGCCGCT TTCCCTGGTT TCCGGCACCA GAAGCGGTGC 
CGGAAAGCTG GCTGGAGTGC GATCTTCCTG AGGCCGATAC GGTCGTCGTC CCCTCAAACT 
GGCAGATGCA CGGTTACGAT CCGCCCATCT ACACCAACGT AACCTATCCC ATTACGGTCA 
ATCCCCCGTT TGTTCCCACG GAGAATCCGA CGCGTTGTTA CTCGCTCACA TTTAATCTTC " 
ATGAAACCTC GCTACACCAA GCCCAGACGC GAATTATTTT TGATCGCGTT CCTATTGGTT 
AAAAAATGAC CTCATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT TAACGTTTAC 
AATTTAAATA TTTCCTTATA CAATCTTCCT CTTTTTGGGG CTTTTCTCAT TATCAACCCC 
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(2) INFORMATION FOR SEQ ID NO: 4 

(i) SEQUENCE CHARACTERISTICS • 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



ATTACCCTTC 


ATCGATTCTC 


TTGTTTCCTC 


/ 380 


TCTACATCTC 


TCAAAAATAG 


CTACCCTCTC 




ATATCATATT 


CATCGTCATT 


TCACTCTCTC 




TACACATTAC 


TCAGGCATTG 


CATTTAAAAT 


7560 


CGTTGAAATA 


AAGGCTTCTC 


CCCCAAAAGT 


7620 


CGATTTAGCT 


TTATGCTCTG 


AGGCTTTATT 


7680 


GTATGATTTA 


TTCCACGTT 




7729 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO-4- 
AATCCTACTA CTATTAGTAG AATTCATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 
=c AGGTTATTGA CCATTTCCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 

zzz r GGAATc tggaatgaaa cttccagaca ^ 

CTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAC CTCTAAGCCA 

~ tgacctctta tcamaggag — — 

TTCCAGTTTC CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCC ATATTTGAAG 

™: TcmTGAT gcaatccgct — 

CAGGGTAAAG. ACCTGATTTT TCATTTATCC TCATTCTCGT TTTCTGAACT GTTTAAAGCA 
=G ATTCAATGAA TATTTATGAC Gaitccgcag 

MACATTTTA CTATTACCCC CTCTCCCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 
™ A TC GTCGTCTGGT AAACGAGCGT TATGATAGTG TTGCTCTTAC TATGCcZ 

GGCGTTATGT ATCTGCAm G1TGAATGTG ATCTCAACTG 
ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC CTTTTATTAA CGTAGATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATCAC crtrrrr^ ^GATTTT 
CAATCATTAA .rrrr ^ATAATCAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 

cZT TACTACTCGT TCTGGTGTTT 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 

™ TCTTGTCAAG ATTACTCTTG ATGAAGGTGA GCCAGCCTAT CCCC ^ 

cgT 0 " TCTTTCAMG CGCTTCCCTT — 

l^ CCCCCT ^^ — G CCGATTTCCA CAC.TTTAT 
ACGCGATG. TACAAATCTC CCTTCTACTT TCTTTCGCCC TTGGTATAAT CGCTGCGGGT 
CAAAGATGAC TCTTTTAC7C TATTCTTTCG CCTCTTTCGT 
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1320 
1380 



1500 



CTCCCATTAC CTATTTTACC CGTTTAATGG AAACTTCCTC AIGAAAAACT CTTTACTCCT 
CAAAOCCTCT GTAGCGGTTG GTACOCTGGT TCCGATGCTG TCTTTCCCTG CTGAGGGTGA 
CGATCCCCCA AAAGCGGCGT TTAACTCCCT CCAACCCTCA GCGAGCGAAT ATATCCGTTA H*0 
TGCGTGGGCG ATGGTTGTTG TCATOTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 
ATTCACCTCG AAAGGAAGCT GAIAAAGCGA TACAATTAAA CGCTCCTTTT GGAGGGTTTT 1 5 60 
TTTTTGGAGA TTTTCAACCT GAAAAAATTA TTATTGGGAA TTCCTTTAGT TGTTCCTTTC 1620 
TATTCTCAGT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCAIAG ACAAAATTCA 
TTTACTAACC TCTGGAAAGA CGAGAAAACT TTAGATCGTT ACCCTAACTA TCAGGGTTGT 
CTGTCGAATG GTACAGGGGT TGIAGTTTGT AGTGCTGACG AAACTCAGTG TTACGGTACA 



1680 
1740 
1800 



„ w * irv^ovj 

IZZIT. TOCGCTTGC TAICCCT0AA AATGAGCGTC GTG< ™ I860 

192( 
198C 
204C 
2100 
>160 

— -iwnniliAA 2220 

GATCGATIGG TTTGTGAATA TGAACGCGAA TGGTCTGACG TGCCTCAACC TCGTGTGAAT 



C ! G0TTCTGA ° CGTGGCGCT ACTAAACCTC ™»««« TGATACACCT 1920 

"' HGTATCATC AAAACCCATG 2160 

TATGAGGGTT AGTGGAAGGG TAAATTCAGA GAGTGCGGTT TCCATTCTCC GTTTAATGAA 



~ w iwniA^rtU^i 

"I"!!!! 1 " ACrTAIAI CAACCCTCIC CACGGCACTT ATCCCCCTG0 TAGTGAGCAA . 1980 

™w„ w *, m AI »ccGG CACTCTTACI 2100 
GAAGGGACTG ACCCCGTTAA AACTTA1TAC CAGTACAGTC GTGTATCATG AAAAGGGATG 



^ UU ^, U AiLCUCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 20,0 
CACAATAATA CGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTCTTACT 



2280 



-.-.w^ i i kj i V-AA I 

GGTGGGGGGG GGTCTGGTGG TGGTTCTGCT GGGGGGTCTG AGGGIGGTGG GTGTGAGGGT 2*0 

CGGGGTTGTG AGGGTGGGGG CTGTGAGGGA GCCGGTTCCG GTGGTGGGTG TGGTTCGGGT 2.00 
CATTTTGATT ATGAAAAGAT GGCAAAGGGI AATAAGGGGG CTATGAGGGA AAATGCCCAT 
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GAAAAGGCGC TACAGTCTGA CGCTAAAGCC AAAGTIGATT CIGTCGGTAG TGATTACGCT 
GCTGCIAICG ATGGTTTGAT TCCTCACGTT TCCCCCCTTC GIAAIGCTAA TGGIGGIACT 
GGTGAmTC GTCGCTCTAA TTCCCAAATG GCTCAAGTCG GIGAGGGTGA TAATTCACCT 
TTAATCAATA ATTTCGGTGA ATATTIACGT TCCCTCCCT" AATGGGTIGA AIGIGGGGGT 
TTTGTCTTTA GCGCTGGTAA ACCATATCAA rmCTATTG ATTCTCACAA AATAAAGTTA 
TTCCGTGGTG TGTTTGGGn TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACC 2820 
TTTCCTAACA TACTCCGTAA TAAGGAGTCT TAATCATCCC ACTTCTTTTG GCTATTCCGT 2880 
TATTATTCCG TTTCCTCCGT TTCGTTGTGG TAACTTTCTT CGGGTAICTC CTTACTTTTC 2M0 
TTAAAAAGGG CTTCGGTAAG ATAGGTATTG GGIGTTTCTT GGTCTTATTA TTGCGCTTAA 3000 
CTCAATTCTT GTGGCTTATC TGTGTGATAT TAGCGOTGAA TTAGGCTGTG ACTTTGTTCA 3060 
GGGTGTTCAG TTAATTCTCC CGTGTAATGG GCTTCCCTGT TTTTATCTTA TTGTCTCTGT 3120 
AAAGCCTCCT ATTTTGATTT TTGACGTTAA ACAAAAAATC CTTTCTTATT TGGATTGGGA 3U0 
TAAATAAIAT GGC7GTTTAT TTTGTAACTC CCAAATTAGC CTGTGGAAAG ACGGTGGTTA 3240 

GCCT7GCTAA GATTCAGGAT AAAATTGTAG '■-rrr-'v- • • 

■wMiui^o „ * GGG xoCr-A AATAGCAAC7 AATCT7GA77 
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TAAGGCTTCA AAACCTCCCG CAAGTCCGGA GGTTCCCTAA MCCCCTCGC GTTCTTACAA 
TACCGGATAA GCCTTCTATA TCTGAITTGC TTGCTATTGG GCGCGGTAAT GATTCCTACG 
ATCAAAATAA AAACGGCTTG CTTGTTCTCG ATGAGTGCGG TACTTGGTTT AATACCCGTT 
CTTCGAATGA TAAGGAAAGA CAGCCGATTA TTGATTGGTT TCTACATGCT CGTAAATTAG 
GATGCGATAT TATTTTTCTT GTTCAGCACT TATCTATTGT TGATAAACAG GCGCGTTCTG 
CATTACCTCA ACATGTTGTT TATTGTCGTC GTCTGGACAG AATTACTTTA CCmTCTCG 
GTACTTTATA TTCTCTTATT ACTGGCTCGA AAATGCCTCT GCCTAAATTA CATGTTGGCG 
TTGTTAAATA TGGCGATTCT CAATTAAGCC CTACTGTTGA GCGTTGGCTT TATACTGGTA 
AGAATTTGTA TAACGCATAT GATACTAAAC AGGCTTTTTC TAGTAATTAT GATTCCGGTG 
TTTATTCTTA TTTAACCCCT TATTTATCAC ACCCTCGCTA TTTCAAACCA TTAAATTTAG 
GTCAGAAGAT GAAGCTTACT AAAATATATT TGAAAAAGTT TTCACGCGTT CTTTGTCTTC 
CGATTGGATT TGCATCAGCA TTTACATATA CTTATATAAC CCAACCTAAG CCGGAGGTTA 
AAAAGGTAGT CTCTCAGACC TATGATTTTG ATAAATTCAC TATTGACTCT TCTCAGCGTC 
TTAATCTAAG CTATCGCTAT GTTTTCAAGG ATTCTAAGGG AAAATTAATT AATAGCGACC 
ATTTACAGAA GCAAGGTTAT TCACTCACAT ATATTGATTT ATGTACTGTT TCCATTAAAA 
AAGCTAATTC AAATGAAATT GTTAAATGTA ATTAATTTTG TTTTCTTGAT GTTTGTTTCA 
TCATCTTCTT TTGCTCAGGT AATTGAAATG AATAATTCCC CTCTCCGCGA TTTTGTAACT 
TGGTATTCAA AGCAATCAGG CGAATCCGTT ATTGTTTCTC CCGATGTAAA AGGTACTGTT 
ACTGTATATT CATCTGACGT TAAACCTGAA AATCTACCCA ATTTCTTTAT TTCTGTTTTA 
CGTGCTAATA ATTTTGATAT CGTTGGTTCA AITCCTTCCA TAATTCAGAA GTATAATCCA 
AACAATCAGG ATTATATTGA TGAATTGCCA TCATCTGATA ATCAGGAATA TGATGATAAT 
TCCGCTCCTT CTGGTGGTTT CTTTGTTCCG CAAAATGATA ATGTTACTCA AACTTTTAAA 
ATTAATAACG TTCGGGCAAA GGATTTAATA CGAGTTGTCG AATTGTTTGT AAAGTCTAAT 
ACTTCTAAAT CCTCAAATGT ATTATCTATT GACGGCTCTA ATCTATTAGT TGTTAGTGCA 
CCTAAAGATA TTTTAGATAA CCTTCCTCAA TTCCTTTCTA CTGTTCATTT CCCAACTGAC 
CAGATATTGA TTGAGGGTTT GATATTTGAG CTTCAGCAAG CTCATCCTTT AGATTTTTCA 
TTTGCTGCTG GCTCTCAGCG TGGCACTGTT CCAGGCGGTG TTAATACTGA CCGCCTCACC 
TCTGTTTTAT CTTCTGCTGG TGGTTCGTTC GGTATTTTTA ATGGCGATGT TTTAGGGCTA 
TCAGTTCGCG CATTAAAGAC TAATAGCCAT TCAAAAATAT TGTCTGTGCC ACGTATTCTT 
ACCCTTTCAG GTCAGAAGGG TTCTATCTCT CTTGCCCAGA ATGTCCCTTT TATTACTCCT 
CGTCTGACTC GTGAATCTGC CAATGTAAAT AATCCATTTC ACACCATTCA CCCTCAAAAT 
GTACGTATTT CCATGAGCCT TTTTCCTGTT GCAATCGCTC GCGGTAATAT TCTTCTGGAT 

ATTACCAGCA AGGCCGATAG TTTCACTTrr rrrirTr^r^ ^.^^ ^ 

liiUAC.rrCT iCTACTCAGG CAAGTGATGT TATTACTAAT 

CAAAGAACTA TTCCTACAAC CGTTAATTTC — ir , 
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CTCACTCATT ATAAAAACAC TTCTCAAGAT TCTGCCGTAC CCTTCCTGTC TAAAATCCCT 
TTAATCGGCC TCCTCTTTAG CTCCCCCTCT GATTCCAACG AGGAAAGCAC CTTATACGTG 
CTCGTCAAAG CAACCATAGT ACGCGCCCTG TACCGGCGCA TTAAGCGCGG CGGGTGTGGT 
GGTTACGCCC AGCGTGACCC CTACACTTGC CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT 

cttcccttcc ittctcgcca cgttcgccgc ctitccccct caagctctaa atcgggggct 

CCCTTTAGCC TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 
TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT TTTCGCCCTT TGACGTTGGA 
GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA ACAACACTCA ACCCTATCTC 
GGGCTATTCT TTTGATTTAT AAGGGATTTT GCCGATTTCG GAACCACCAT CAAACAGGAT 
TTTCGCCTGC TGGGGCAAAC CAGCGTGGAC CGCTTGCTGC AACTCTCTCA GGGCCAGGCG 
CTGAAGGGCA ATCACCTGTT GCCCGTCTCG CTGCTCAAAA GAAAAACCAC CCTGGCCCCC 
AATACGCAAA CCGCCTCTCC CCCCGCCTTC GCCCATTCAT TAATGCAGCT GGCACGACAG 
GTTTCCCGAC TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGACTT AGCTCACTCA 
TTACCCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCCT ATGTTGTGTG GAATTGTGAG 
CGGATAACAA TTTCACACGC CAAGGAGACA GTCATAATGA AATACCTATT GCCTACGGCA 
GCCGCTGCAT TCTTATTACT CGCTGCCCAA CCAGCCATGG CCCAGCTCTT CCCCCCATCT 
GATGAGCAGT TCAAATCTCC AACTGCCTCT GTTGTGTGCC TCCTCAATAA CTTCTATCCC 
AGAGAGGCCA AAGTACAGTG GAAGGTCGAT AACGCCCTCC AATCGGGTAA CTCCCAGGAG 
AGTGTCACAG AGCAGGACAG CAAGGACAGC ACCTACAGCC TCAGCAGCAC CCTGACGCTG 
AGCAAAGCAG ACTACGAGAA ACACAAAGTC TACGCCTGCG AAGTCACCCA TCACGCCCTG 
AGCTCGCCCC TCACAAAGAG CTTCAACAGG GGAGAGTGTT CTACAACGCG TCACTTCGCA 
CTGGCCGTCG TTTTACAACC TCGTGACTGG GAAAACCCTG CCCTTACCCA ACCTTAATCC 
CCTTGCAGAA TTCCCTTTCG CCAGCTGGCG TAATAGCCAA GAGGCCCGCA CCGATCGCCC 
TTCCCAACAG TTGCGCAGCC TGAATGGCGA ATCGCGCTTT GCCTGGTTTC CGCCACCAGA 
AGCGGTCCCG GAAAGCTGGC TGGAGTGCGA TCTTCCTGAG GCCGATACGG TCGTCGTCCC 
CTCAAACTGG CAGATGCACG CTTACGATGC GCCCATCTAC ACCAACGTAA CCTATCCCAT 
TACGGTCAAT CCCCCCTTTC TTCCCACCGA GAATCCGACG GGTTGTTACT CCCTCACATT 
TAATGTTGAT GAAAGCTGGC TACAGGAAGG CCAGACGCGA ATTATTTTTG ATCGCGTTCC 
TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG CGAATTTTAA CAAAATATTA 
ACGTTTACAA TTTAAATATT TGCTTATACA ATCTTCCTGT TTTTCGGGCT TTTCTGATTA 
TCAACCGGGG TACATATGAT TGACATGCTA GTTTTACCAT TACCGTTCAT CGATTCTCTT ' 
GTTTGCTCCA GACTCTCAGC CAATGACCTG ATAGCCTTTG TAGATCTCTC AAAAATAGCT 
ACCCTCTCCC GCATTAATTT ATCAGCTAGA ACGGTTCAAT ATCATATTGA TGGTGATTTC 
ACTCTCTCCC GCCTTTC7CA CCC7TTTCAA TCTTTACCTA CACATTACTC AGGCA77CC, 
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TTTAAAATAT ATGAGGCTTC TAAAAAITTT TATCCrTGGC TTCAAATAAA COCTTCTCCC 
CCAAAAGTAT TACACCCTCA TAATCTTTTT GCTACAACCC ATTTACCTTT ATGCTCTGAG 
CCTTTAITCC TTAATTCTGC TAATTCHTG CCTTGCCTCT ATGATTTATT GGATCTT 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS- 

m T^? ra: 8 , 11 ? base P at " 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5- 
AATCCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 
ATACCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 
CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 
CTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 
TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 
TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 
TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 
CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 
TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 
AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTC CAAAAGCCTC TCGCTATTTT 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 
AATTCCTTTT GGCCTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTC" 
ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 
CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 
AATATCCGCT TCTTGTCAAG ATTACTCTTC ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 
TGTACACCCT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CCGTTCCCTT ATCATTGACC 
CTCTCCCCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTGGA CACAATTTAT 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 
CAAAGATGAG TCTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 
GTGGCATTAC GTATTTTACC CCTTTAATGG- AAACTTCCTC ATCAAAAACT CTTTAGTCC" 
CAMCCCTC7 CTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTC- 
CCATCCCCCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTT, 
TCCC.CGCCG ATGGTTGTTG TCATTCTCCC CCCAACTATC CCTATCAAGC TGTTTAAGA.- 
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™ C C AAAGGAAGCX GAXAAAGGGA TACAATTAAA GGGXGGXXTT GGAGCCXXTX 
XXTXXGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTACT XGTXCCXXTC 
TAXXCXCACX CCGCXGAAAC TGTTCAAAGT TGITTAGCAA AACCCCATAC AGAAAATTCA 
TTTAGTAAGC TCTGGAAACA GGACAAAAGT TTAGATCCTT ACGCTAACTA TCACCCTTCT 

ZZZ C Z CCCCT Tcm AC ~ ™° ™— 

TGGGTTGGTA TTGGCCTTCC TATCCCTGAA AATGAGGGTG CTGGCTCTGA GCGTGGGGGT 
TGTGACGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC GTGAGTAGGG TGATACACCT 
ATXCCGGGCT ATACTTATAT CAACCCTCTC CACGCCACTT ATCCGCCTCG TACTCAGCAA 
AACCCCCCTA ATCGTAATGC TTCTCTTCAG GAGTGTGAGG CTCTTAATAC TTTGATGm 
CAGAATAATA GGTTCGGAAA TAGGGAGGGG GGAITAACTG TTTATACGGG CACTGTTACT 
CAAGGGAGTG AGGCGGTTAA AAGTTATTAG CAGTACACTC GTGTATGATG AAAACCCATC 
XAXGACGGTT ACTGGAAGGG TAAATTCAGA GAGXGGGGXT TGCATTGTGG CXTTAATGAA 
GATCCATTCC TTTGTGAATA TCAAGCCCAA TGGTGTGAGC TGCCTCAACC TCCTGTCAAT 
GGXGGGGGGG GCTCTGGTGC TGGTTCTGGT GGGGGCTCTG AGGGTCGTCG CTCTGAGGCT 
GGCGGTTCTG AGGGXGGGGG CTCTGAGGCA GGGGGTTGGG GTCGTGGCTC XGGXXGGG^ 
GATTTTGATT ATGAAAAGAT CGGAAAGGGT AATAAGGGGG GXAXGAGGGA AAAXGGGGAX 
GAAAAGGGGC TAGAGTCTGA GGGXAAAGGG aaacttcatt ctgtcggtac tgattacggt 
GCTCCTATCG ATGGTTTCAT TCGTCACGTT TCCCCCCTTG CTAATCCTAA TCCTGCTACT 
GGTGATTTTG CTGCCTCTAA TTCCCAAATC GCTGAAGTCG GXGAGGGXGA TAATTGACGT 
XTAAXCAAXA ATTTCCGTCA ATATTTACCT XGGCXCGCXC AATCGGTTCA ATGTCGCGGT 
TXXGXCXXXA GGGGXGGXAA AGGATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 
™ G XCXTXGCCTX XGXXTTAXAX GXTGGCACGX TXAXGTAXGX ATTTTCXACG 
TTTCCTAACA XACTGCGTAA TAACGAGTCT TAAXCATGCC AGTXCXTXTG GCTAXTCCGX 
XAXXAXTCCC XXXGCXCGGX XXGGXXCTGG XAAGXTXGXX CGGGXATCXG CTT 

CTICCGTAAG ATAGCIA ™ CTAmcATT c ™" ™- 

GGGXXAACXG AAXTGTXGXG GGXTAXGXCX GXGAXAXXAG CGCXCAATTA CGGXCXGAGX 
TTGTXGAGGG XGTTGAGXXA AXXGXCGGGX CXAAXGGGCX XGGCXGXXTX XAXGTXAXXG 
ICXCXCIAAA GGCXGCXAIT TTCATTTTTC *r^~ aiuitatxc 

TTCAXTTTrG ACGXTAAACA AAAAAICCXX XCTIAXXXCG 
AXTGGGAXAA AXAAXAXGGG XGXTXATTTX GXAAGXGGGA AAXXAGGGXG XGGAAAGAGG 

CXGGXXAGCG TXGGXAAGAX XGAGGAXAAA AXTGXAGGXG GC XGCAAAAT ^ 

1U1AGCTG GGTGCAAAAT ACCAACTAAT 
TTGAXXXAA GGGXXCAMA CCXCCCGCAA GXGGGGAGGX TCCCXAAAAC GCCTCGCC7X 
GXXAGAAXAC GGGAXAAGGG TTGXAXAXGX GAXXXCGXTG GXAXXGGGCG GGGTAATGAX 
TCGXACGAXG AAAAXAAAAA CGCGXXGGTX GTXCXGGAXG AGXGCGGXAC XXGCXXXAAX 
ACGGGXTCXT GGAATGAXAA GGAAAGAGAG CGGAXTAXTC ^GGT— --r.~ — 
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TTTGTCGGTA ClrTATATTC TCTTATTACT CGCTCGAAAA TGGCTCTGCC TAAATTACAT 
GTTGCCGTTC TTAAATATGG GGAPICTCAA TTAAGCCCTA CTCTTCAGCG TTGGCTTTAT 
ACTGGTAAGA ATTTGTATAA CCCATATCAT ACIAMCAGG CTTTTTCTAG TAATTATGAT 
TCCGCTGTTT ATTCTTATTT AACGCCTTAT TTATCAGACG GTCGGTATTT CAAACCATTA 
AAITTAGGTC AGAAGAIGAA GGTTACIAAA ATATATTTCA AAAAGTTTTC ACGCGTTGTT 
TGTGITGGGA TTCGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAACCCC 
GACGTTAAAA AGGTAGTGTG TGAGAGCTAI CATTITCATA AATTCACTAT TGACTCTTCT 
CAGCGIGTTA ATCTAAGCTA TCCCTATCTT TTCAAGGATT GTAAGCGAAA ATTAATTAAT 
AGCGACGATT TAGAGAAGGA AGGTTATTCA CTCACATATA TTCATTTATG IAGTGITTCG 
AITAAAAAAG GTAATTCAAA TGAAATTCTT AAATGTAATT AATTTTGTTT TCTTCATGTT 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 
TGTAACTTGG TATTCAAAGC AATCACGCCA ATCCCTTA7T GTTIGICGCG ATGTAAAAGG 
TACTGTTACT GTATATTCAT CTGACGTTAA AGCTGAAAAT CTACCCAATT TC7TTATTTC 
TGTTTTAGGT CCTAATAATT TTGATATGCT TGGTTCAATT CCTTCCATAA .TTCAGAAGTA 
IAATGCAAAC AATCAGGATT ATATTGAIGA ATTGGCATCA TGTGATAATG ACGAATATGA 
TGATAAITGC GCTCCTTCTC CTCGTTTCTT TGTTCGGCAA AATGATAAIG TTACTCAAAC 
TTTIAAAATT AATAACCTTC GGCCAAAGCA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 
GTCIAATACT TCTAAATGCT CAAAIGIAIT AICTATTGAC GGCTGTAATG 7A1TACTTGT 
TACIGCACCT AAAGATATTT TAGATAACCT TGGTGAATTC CTTTCTAGIG TTGATTTGCC 
AACTGAGGAG ATATTGATTG AGCGTTTCAT ATTTGAGGTI CAGCAAGGTC ATGGTTTAGA 
TTTTTCATTT CCTCCTGGCT CTCAGCCTGG GACIGTTGGA GCCCCTCTTA ATACTGACCG 
CCTCACCTCT GTTTTATGTT CTGCTCGTGC TTCGTTCGGT ATTHTAATG CCCATCTTTT 
AGGGGIAICA GTTGGGGGAT TAAAGAGIAA TAGGGATTCA AAAATATTCT GTGTGGGAGG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCIGIGTT GGCCAGAATG TCCGTTTIAI 
TACTGGTCCT GTGACTGGTG AATCTGCCAA TCTAAATAAT CCATTTCAGA CGATTGAGCG 
TGAAAATGTA GGTATTTGGA TGAGCGTTTT TCCTCTTGCA ATGGCTGGGG CTAATATTCT 
TCTGGATATT AGCAGCAACG CCGATAGTTT GAGTTGTTCT ACTCAGGCAA GTGATCTTAT 
TAGTAATCAA AGAACTATTG CTACAACCCT TAATTTCCGT GATGGAGAGA CTGTTTTACT 
GGGTGGCCTC AGTGATTATA AMAGAGTTC TCAAGATTC? GCCCTACCCT TCCTGTCTAA ' 
AATCCGTTTA AIGCGCGTCC TGITTAGCTC GCGCTCTGAT TGCAAGCAGG AAAGCACGTT 
ATACGTGCTC CTCAAACCAA GGATAGTACG CGCCCTCTAG CCCCCCATTA AGCGCGCCGC 



CTCTCCTCCT TACGCGCAGC CTCACCCCTA CACTTSCCAC 
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TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCCTCAA GCTCTAAATC 
GCCCGCTCCC TTTACCCTTC CGATTTAGTG CTTTACCGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TCGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTCGAACA ACACTCAACC 
CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 
ACAGCATTTT CGCCTCCTGG GGCAAACCAG CCTCGACCCC TTCCTGCAAC TCTCTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTCAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC CGCTCCTATG TTCTGTCGAA 
TTGTCAGCGG ATAACAATTT CACACGCCAA GGAGACAGTC ATAATGAAAT ACCTATTGCC 
TACCGCAGCC GCTGGATTGT TATTACTCGC TGCCCAACCA GCCATGGCCG AGCTCTTCCC 
GCCATCTGAT GAGCAGTTGA AATCTGGAAC TGCCTCTGTT GTGTGCCTGC TGAATAACTT 
CTATCCCAGA CAGGCCAAAG TACAGTGGAA GGTGGATAAC GCCCTCCAAT CGGCTAACTC 
CCAGGAGACT GTCACAGAGC AGGACAGCAA GGACAGCACC TACAGCCTCA GCAGCACCCT 
GACGCTGAGC AAAGCAGACT ACGAGAAACA CAAAGTCTAC GCCTCCCAAC TCACCCATCA 
GCGCCTGACC TCGCCCGTCA CAAAGAGCTT CAACACGCCA GAGTGTTCTA GAACGCCTCA 
CTTGGCACTG CCCCTCCTTT TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAAGC 
TTTGTACATG GAGAAAATAA ACTCAAACAA AGCACTATTG CACTCGCACT CTTACCGTTA 
CTGTTTACCC CTCTGGCAAA AGCCGCCTCC ACCAAGGGCC CATCGCTCTT CCCCCTGCCA 
CCCTCCTCCA AGAGCACCTC TCGGGGCACA GCGGCCCTGG GCTGCCTGGT CAACACTAAT 
TCCCCCAACC GGTGACGGTG TCCTGGAACT CAGCCCCCCT GACCAGCGGC CTCCACACCT 
TCCCGGCTGT CCTACAGTCC TCAGGACTCT ACTCCCTCAG CAGCGTGCTG ACCCTCCCCT 
CCAGCAGCTT GGGCACCCAG ACCTACATCT GCAACGTGAA TCACAAGCCC AGCAACACCA 
ACGTCCACAA GAAAGCAGAG CCCAAATCTT GTACTAGTGG ATCCTACCCG TACGACCTTC 
CGGACTACGC TTCTTAGCCT GAAGGCGATG ACCCTGCTAA GCCTGCATTC AATAGTTTAC 
AGGCAACTGC TACTGAGTAC ATTGGCTACG CTTCCGCTAT CGTAGTAGTT ATAGTTGGTG 
CTACCATAGG GATTAAATTA TTCAAAAAGT TTACGAGCAA GGCTTCTTAA GCAATAGCGA 
AGAGGCCCGC ACCGATCGCC CTTCCCAACA CTTGCGCAGC CTGAATGGCG AATCCCCCTT 
TGCCTGGTTT CCGGCACCAG AAGCGGTGCC CGAAAGCTGG CTGGAGTGCG ATCTTCCTGA . 
GCCCGATACG GTCCTCGTCC CCTCAAACTC CCACATCCAC GGTTACGATC CCCCCATCTA 
CACCAACGTA ACCTATCCCA TTACCGTCAA TCCGCCGTTT GTTCCCACCG AGAATCCGAC 
GGCTTCTTAC TCCCTCACAT TTAATCTTCA TGAAACCTGG CTACACGAAG GCCAGACGCG 
.--ATTATTi ;T GATGGCGTTC CTATTCGTTA AAAAATCACC TGAT7TAACA AAAA777AAC 
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CCGAATTTTA ACAAAATATT AACCTTTACA ATTTAAATAT TTGC7TATAC AATCTTCCTC 
TTTT7GGGGG TTTTCTGATT ATGAACGGCG CIACAIATCA TTGACATCCT AGTTTTACGA 
TTACGGTTGA TCGATTCTCT TCTITGCTCC AGACIGIGAG GCAATGACCT GATAGGGTTT 
GTACAICTCT CAAAAATAGC TACGGTCTCC GGGArTAATT TATCACCTAG AACGGTTGAA 
TATCATATTG ATCCTCATTT GACTCTCTCG GCCGTTTCTG ACCCnTTCA ATCITTACGT 
ACACAITACT GAGGCATTGG ATTTAAAATA TATCAGGGTT CTAAAAATTT TTATCCTTGC 
GTTGAAATAA AGGCITCTCC CGCAAAAGTA TTAGAGGGIG ATAATGTTTT TGCTAGAACG 
GATTTAGCTT TAIGCTCTGA GCCTTTATTC CITAAITTIG CTAATTCTTT GGCTTGCCIG 
TATGATTTAT TGGACCTT 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: misc difference 

(B) LOCATION: replace (5 »») 

(D) OTHER INFORMATION- /note- urpDrcrvrrr ^ 

OF G AND C- / note - S REPRESENTS EQUAL MIXTURE 

(ix) FEATURE: 

(A) NAME/KEY : misc difference 

(B) LOCATION: replSce(6 

(D) OTHER INFORMATION: /note- "K RFPRFcrwrc rn „ , 

OF A AND C" REPRESENTS EQUAL MIXTURE 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace(8 " ) 

(D) OTHER INFORMATION : /note- "R RFPRrcrwrc -„„ , 

OF A AND G" REPRESENTS EQUAL MIXTURE 

(ix) FEATURE: 

(A) NAME /KEY: misc difference 

(B) LOCATION:, replace ( 11 

(D) OTHER INFORMATION: /note- "K REPRMfhtc r«„., 

OF G AND T" REPRESENTS EQUAL MIXTURE 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 
B) LOCATION: repl5ce(20 --) 
(D) OTHER INFORMATION: /note- -V REPRESFvrc rn„„ „ 

OF A AND T" REPRESENTS EQUAL MIXTURE 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGGTSMARC7 KCTCCAGTCU GG • 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS'- 

)5( JJl E: nuc leic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
ACCTCCACCT CCTCCACTCT CG 
(2) INFORMATION FOR SEQ ID.NO:8: 

(i) SEQUENCE CHARACTERISTICS - 

"(B) TYP?"-" 2 ? - aSe Pal " 
)rl nucle J-C acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8 
ACCTCCACCT CCTCGAGTCA GG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS • 
A) LENGTH: 22 base pairs 
? J£ E: nuc l«ic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
ACCTCCACCT TCTCCACTCT GC 
(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS • 

(Bl TYPE?* 2 ? ^" "Sr. 
)5( J Y J E: nuc leic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linelr 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AGGTCCAGCT TCTCCAGTCA GC 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS- 

(S) TYPE^ 2 ? baSC 
(d) liPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCY: linear 8 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
ACCTCCAACT CCTCCACTCT GC 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS • 
(A) LENGTH: 22 base pairs 

B) TYPE: nucleic acid 

C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
ACCTCCAACT GCTCGAGTCA GC 
(2) INFORMATION FOR SEQ ID NO: 13; 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 22 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
AGGTCCAACT TCTCGAGTCT GG 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH : 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 8 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGGTCCAACT TCTCGAGTCA GG 
(2) INFORMATION FOR SEQ ID NO:15 : 

(i) SEQUENCE CHARACTERISTICS • 

,11 VH Z: nuc leic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 8 

(ix) FEATURE: 

(A) NAME/KEY : misc difference 
* ^ ATI0N: "Place(5. 6 
(D) OTHER INFORMATION: /note- "N'-INOSINE 

(ix) FEATURE: 

(A) NAME/KEY: Disc difference 

(B) LOCATION: repllce(8 

(D) OTHER INFORMATION: /note- "N-INOSI.VE 
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(ix) FEATURZ: 

(A) NAME/KEY: aisc difi.rence 

(B) LOCATION: repllce(ll -■) 

(D) OTHER INFORMATION: /note- "N-INOSINE" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace(20 "") 

(D) OTHER INFORMATION: /note- "V REPRESENTS EQUAL MIXTURE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
ACCTNNANCT NCTCCAGTCV CG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTATTAACTA GTAACCGTAA CACTGGTCCC TTGCCCCA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
AGGCTTACTA GTACAATCCC TGGGCACAAT 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
CCAGTTCCGA GCTCCTTCTG ACTCAGGAAT CT 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH : 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
CCACTTCCCA CCTCCTCTTG ACGCAGCCCC CC 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

. (C) STRANDEDNESS : sinele 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CCACTTCCCA GCTCGTGCTC ACCCAGTCTC CA 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
CCACTTCCCA CCTCCAGATC ACCCAGTCTC CA 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CCACATGTCA CCTCCTGATG ACCCACACTC CA 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCAGATGTGA GCTCCTCATG ACCCAGTCTC CA 
(2) INFORMATION FOR SEQ ID NO: 2h: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH : 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: 1; nea- 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CCACTTCCCA CCTCCTCATG ACACAGTCTC CA 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS • 

lul 32 base P^i" 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOCY: linear 



(xi) EQUENCE DESCRIPTION: SEQ ID NO: 25: 
CCAGCATTCT AGAGTTTCAG CTCCAGCTTC CC 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 
.(C) STRANDEDNESS: sinele 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GCGCCGTCTA GAATTAACAC TCATTCCTCT TGAA ' 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GATCCTAGGC TCAAGGCGAT GACCCTCCTA ACGCTCC 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOCY: linear 8 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
ATTCAATACT TTACACCCAA CTCCTACTCA C7ACA 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH : 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
TTGGCTACGC TTGCCCTATC GTAGTACTTA TAGTT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CCTGCTACCA TAGGGATTAA ATTATTCAAA AAGTT 
(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TACGAGCAAG GCTTCTTA 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS • 
f A > ^GTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
AGCTTAAGAA GCCTTCCTCC TAAACTTTTT CAATAATTT 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
AATCCCTATG CTAGCACCAA CTATAACTAC TACCAT 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucLeic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG jGGTC 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
ATCGCCTTCA GCCTAG 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CATTTTTGCA CATGGCTTAG A 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
TAGCATTAAC GTCCAATA 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ATATATTTTA CTAAGCTTCA TCTTCT 
(2)' INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
CACAAAGAAC GCGTGAAAAC TTT 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CCCCCCCTCT TCCCTATTCC TTAACAACCC TTGCT 
(2) INFORMATION FOR SEQ ID NO:42:- 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
AAACCACCCC CAGTCCCAAG TCACCCCTCT CAAATTGTTA TC 
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(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCC 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 44; 
GGCGTTACCC AAGCTTTCTA CATGGAGAAA ATAAAG 
(2) INFORMATION FOR SEQ ID NO: 45; 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TCAAACAAAC CACTATTCCA CTCCCACTCT TACCGTTACC GT 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TACTCTTTAC CCCTCTGACA AAAGCCGCCC ACGTCCAGCT GC 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS ■ 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOCY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
TCCACTCACC CCTATTCTCC CCAGCCATTG TACTACTCCA TCCC ^ 
(2) INFORMATION FOR SEQ ID NO: 48: 

<i) SEQUENCE CHARACTERISTICS ■ 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
TGGCGAAAGG CAATTCCCAT CCACTAGTAC AATCCCTC 
(2) INFORMATION FOR SEQ ID NO:49: * 

(i) SEQUENCE CHARACTERISTICS ■ 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GCCACAATAC GCCTGACTCG ACCACCTGCA -CCAGGGCGCC TT 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TTGTCACACG GGTAAACAGT AACGCTAACG CTAACTGTCC CA 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS- 

til HS9 ra: 62 base Pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
GTGCAATACT CCTTTCTTTC ACTTTATTTT CTCCATGTAC AA 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 



38 



42 



42 



42 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
TAACCGTAAC AGTCCCACTG C 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS ■ 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
CACCTTCATG AATTCCCCAA GCACACAGTC AT 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS- 

ip! ^ TH: 22 base P»i™ 
(P) TYPE: nucleic acid 

(j.) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) .SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
AATTCGCCAA CGAGACACTC AT 
(2) INFORMATION FOR SEQ ID NO: 55: 
(i) SEQUENCE CHARACTERISTICS • 

(o; TYPE: nucleic acid 

(C) STRANT ONESS: sinele 

(D) TOPOLO.-Y: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic aciS 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
ATTACTCGCT CCCCAACCAC CCATCGCCCA GCTCGTGAT 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
TCTAGAACGC GTC 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH : A 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
T7CACGTTCA ACCTTACCCG TTCTAGAATT AACACTCATT CCTG 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH : 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinele 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 
TGGATATCTG GAGTCTGGGT CATCACCAGC TCGGCCATG 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 39 base pairs 

(B) TYPE: 

nucleic acid 

(C) STRANDEDNESS: sinele 

(D) TOPOLOCY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
CCTCCTTCCC CACCCACTAA TAACAATCCA CCGCCTCCC 
(2) INFORMATION FOR SEQ ID NO: 62: 

(1) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 37 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
GTACCCAATA CCTATTTCAT TATCACTCTC CTTCGCC 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63; 
TCACTGTCTC CTTGCCGTCT CAAATTCTTA. 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS- 
(A) LENGTH: 36 base pairs 
B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TAACACTCAT TCCGGATGGA ATTCTGCACT CTCGCT 
(2) INFORMATION FOR SEQ ID NO: 65: 
(i) SEQUENCE CHARACTERISTICS- 

(fl) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
CCCAGTGCCA AGTGACGCCT TCTA 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH : 26 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: sinele 

(D) TOPOLOCY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 
ATATATTTTA GTAAGCTTCA TCTTCT 
(2) INFORMATION FOR SEQ iD NO: 67: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GACAAAGAAC GCGTGAAAAC TTT 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 76 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

• (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

CTGAACCTGT CTGGGACCAC AGTTGATGCT ATAGGATCAG ATCTAGAATT CATTTAGAGA 

CTGGCCTGGC TTCTGC 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS- 
' (A) LENGTH: 80 base pairs 

(B) - TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TCGACCGTTC CTACCAATAA TCCAATTAAT GGAGTAGCTC TAAATTCAGA ATTCATCTAC 
ACCCAGTGCA TCCAGTAGCT 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
GGTAAACAGT AACCGTAAGA CTCCCAC 
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(I) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENGTH: 54 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
CGCCTTCAGC CTAAGAAGCG TAGTCCGGAA CGTCGTACGG CTACGATCCA 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS" 

(A) LENGTH : 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CACCGGTTCG GGGAATTAGT CTTCACCAGC CAGCCCAGCC C 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
ATTCCACACA TTATACGAGC CGGAAGCATA AAGTGTCAAG CCTGGGGTGC C 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS' 

(A) LENGTH : 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
CTGCTCATCA GATGGCGGGA AGAGCTCCCC CATGGCTGGT TG 
(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
CAACACACTC ACCCAGGGCC CGACCTCCGC CATCCCTCGT TG 



WO 92/06204 

PCT/LS9 1/07 149 

79 

I Claim: 



1. A composition of matter comprising a 
Plurality of cells containing diverse combinations of first 
and second DNA sequences encoding first and second 
polypeptides which form heteromeric receptors, one or both 
of said polypeptides being expressed as fusion proteins on 
the surface of a cell. 

2. The composition of claim i, wherein said 
plurality of cells are E. coli . 

3. The composition of claim 1, wherein said 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors, integrins, hormone receptors 
and transmitter receptors. 

4. The composition of claim 1, "wherein said 
first and second DNA sequences encode functional portions 
of heteromeric receptors. 

5. The composition of claim 4, wherein said 
first and second DNA sequences encode functional portions' 
of the variable heavy and variable light chains of an 
antibody. 

6. The composition of claim 1, wherein said 
cell produces filamentous bacteriophage. 

7. The composition of claim 6, wherein said 
filamentous bacteriophage are selected from the group 
consisting of M13, fd and fl. 

8. The composition of claim 6, wherein at least 
one of the encoded first or second polypeptides is 
expressed as a fusion protein with gene viu. 
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9. 



10 



A kit for the preparation of vectors useful 
for the coexpression of two or more DNA sequences encoding 
polypeptides which for* heteromeric receptors comprising 
two vectors, a first vector having two pairs of restriction 
sites symmetrically oriented about a cloning site which can 
be combined with a second vector, having two pairs of 
restriction sites symmetrically oriented about a cloning 
site and in an identical orientation to that of the first 
vector-, wherein one or both vectors contains sequences 
necessary for expression of polypeptides encoded by DNA 
sequences inserted in said cloning sites. 

10. The kit of claim 9, wherein said first and 
second vectors are circular. 

11. The kit of claim 9, wherein said expression 
peptides is as fusion proteins on the surface of a cell. 

12. The kit of claim 9, wherein said cell 
produces filamentous bacteriophage. 

13. The kit of claim 9, wherein said filamentous 
bacteriophage is selected from the group consistina of M13 
fd and f 1 . ' ' 



14. The kit of claim 13, wherein at least one of 
the DNA sequences is expressed as a fusion protein with 
gene VIII. 

15. The kit of claim 9, wherein said two pairs 
of restriction sites are Hind III-Mlu I and Hind IH-Mlu I. 
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16. A cloning system for the coexpression of two 
or more DNA sequences encoding polypeptides which form a 
heteromeric receptor, comprising a set of first vecors 
having a diverse population of first DNA sequences and a 
o set of second vectors having a diverse population second 
DNA sequences, said first and second vectors having two 
pairs of restriction sites symmetrically oriented about a 
cloning site for containing said first and second 
populations of DNA sequences so as to allow only the 
10 operational combination of vector sequences containing said 
first and second DNA sequences. 

17. The cloning system of claim 16, wherein said 
first and second vectors are circular. 

18. The cloning system of claim 16, wherein said 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors, integrins, hormone receptors 
and transmitter receptors. 

19. The cloning system of claim 16, wherein said 
first and second DNA sequences encode functional portions 

of heteromeric receptors. 

20. The cloning system of claim 19, wherein said 
first and second DNA sequences encode functional portions 
of the variable heavy and variable light chains of an 
antibody . 



21. The cloning system of claim 16, wherein said 
coexpression of two or more DNA sequences encoding 
polypeptides which form a heteromeric receptor is on the 
surface of cell. 



22. The cloning system of claim 16, wherein said 
cell produces a filamentous bacteriophage. 
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23. The cloning system of claim 22 wherein said 
filamentous bacteriophage selected from the group 
consisting of M13, fd and fl. 

24. The cloning system of claim 23, wherein at 
least one of the DNA sequences is expressed as a fusion 
protein with the protein product of gene Vin. 

25. The cloning system of claim 16, wherein said 
two pairs of restriction sites are Hind III-Mlu I and Hind 
III-Mlu I. 

26. A plurality of expression vectors containing 
a plurality of possible first and second DNA sequences 
encoding polypeptides which form a heteromeric receptor 
exhibiting binding activity toward a preselected molecule, 

5 said DNA sequence encoding heteromeric receptors being 
operatively linked to genes encoding surface proteins of a 
cell. 



27. The expression vectors of claim 26, wherein 
said expression vectors are circular. 

28. The expression vectors of claim 23, wherein 
said heteromeric receptors are selected from the group 
consisting of antibodies, T cell receptors, integrins, 
hormone receptors and transmitter receptors. 

29. The expression vectors of claim 26, wherein 
said first and second DNA sequences encode functional 
portions of heteromeric receptors. 

30. The expression vectors of claim 29, wherein 
said first and second DNA sequences encode functional 
portions of the variable heavy and variable light chains of 
an antibody. 
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31. The expression vectors of claim 26, wherein 
said cells produce filamentous bacteriophage. 

32. The expression vectors of claim 26, wherein 
said filamentous bacteriophage are selected from the group 
consisting of M13, fd and fl. 

33. The expression vectors of claim 32, wherein 
at least one of the encoded first or second polypeptides is 
expressed as a fusion protein with gene VIII. 

34. A method of constructing a diverse 
population of vectors capable of expressing a diverse 
population of heteromeric receptors, comprising: 

(a) operationally linking to a first vector 
a first population of diverse DNA 
sequences encoding a diverse. population 
of first polypeptides, said first 
vector having two pairs of restriction 
sites symmetrically oriented about a 
cloning site; 

(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 
population of second polypeptides, said 
second vector having two pairs of 
restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 
first vector; and 



(c) combining the vector products of step 
(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
first and second DNA sequences. 
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35. The method of claim 34, wherein said first 
ana second vectors are circular. 

36. The method of claim 34, wherein said 
heteromeric receptees are selected from tne 
consisting of antibodies, T cell receptors, integrins 
honnone receptors and transmitter receptors. 

37. The method of claim 34, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

38. The method of claim 34, wherein said 
expression of a diverse population of heteromeric receptors 
is on the surface of a cell. 

39. The method of claim 37, wherein said cell 
produces a bacteriophage. 

40. The method of claim 39, wherein said 
filamentous bacteriophage is selected from the group 

■consisting of M13, fd and f 1 . 

41. The method of claim 34, wherein at least one 
of said first or second DNA sequences is expressed as a 
gene vin fusion protein. 

42. The method of claim 34, wherein said two 
pairs of restriction sites are Hind III-Mlu I and Hind III- 
Mlu I. 
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43. The method of ' claim 34, wherein said 
combining step further comprises: 

(CI) restricting said first vector with a 
restriction enzyme recognizing one of 
the restriction sites encoded in said 
two pairs of restriction sites; 

(C2) restricting said second vector with a 
different restriction enzyme 
recognizing the second restriction 
site encoded in said two pairs of 
restriction sites; 

(C3) digesting the 3- ends of said 
restricted first and second vectors 
with an exonuclease; and 

(C4) annealing said first and second 
vectors. 
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44. A method for selecting a heteromeric 

receptor exhibiting binding activity toward a preselected 

molecule from a population of diverse heteromeric 
receptors, comprising: 

5 (a) operationally linking to a first vector 

a first population of diverse DNA 
sequences encoding a diverse population 
of first polypeptides, said first 
vector having two pairs of restriction 
sites symmetrically oriented about a 
cloning site; 

(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 
population of second polypeptides, said 
second vector having two pairs of 
restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 
first vector; 
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(C) 



25 



combining the vector products of step 
(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
first and second DNA sequences. 



(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing 
said population of first and second DNA 
sequences ; and 

(e) determining the heteromeric receptors 
which bind to said preselected 
molecule. 
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45. The method of cla 1m 44, wherein said first 
and second vectors are circular. 

46. The .ethod of claim 44, wherein said 
heteromer.c receptors are selected f rom the gr oup 
consx.txng of antibodies, T cell receptors, interns' 
hormone receptors and transmitter receptors. 

47. The method of claim 44, wherein said first 
and second DNA sequences encode functional portions of 
heteromeric receptors. 

48. The method of claim 47, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

49. The method of claim 44, wherein said 
expressxon of a diverse population of heteromeric receptors 
is on the surface of a cell. 

50. The method of claim 49, wherein said cell 
produces a filamentous bacteriophage. 

51. The method of claim 50, wherein said 
filamentous bacteriophage is selected from the group 
consisting of M13, fd and fl. 

52. The method of claim 51, wherein at least one 
of saxd first or second DNA sequences is expressed as a 
gene VIH fusion protein. 

53. The method of claim 44, wherein said two 
paxrs of restriction sites are Hind Ili-Mlu I and Hind III- 
Mlu I. 
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54. The method of claim 44, wherein 
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(CI) restricting said first vector with a 
restriction enzyme recognizing one of 
the restriction sites encoded in said 
two pairs of restriction sites; 

(C2) restricting said second vector with a 
different restriction enzyme 
recognizing the second restriction 
site encoded in said two pairs of 
restriction sites; 

(C3) digesting the 3- ends of said 
restricted first and second vectors 
with an exonuclease; and 

(C4) annealing said first and second 
vectors. 
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55. A method for determining the nucleic acid 
sequences encoding a heteromeric receptor exhibiting 
binding activity toward a preselected molecule from a 
diverse population of heteromeric receptors, comprising: 

(a) operationally linking to a first vector 
a first population of diverse DNA 
sequences encoding a diverse population 
of first polypeptides, said first 
vector having two pairs of restriction 
sites symmetrically oriented about a 
cloning site; 

(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 
population of second polypeptides, said 
second vector having two pairs of 
restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 
first vector; 



25 



(c) 



30 



combining the vector products of step 
(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
first and second DNA sequences. 



(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing 
said population of first and second DNA 
sequences ; 
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(e) 



determining the heteromeric receptors 
which bind to said preselected 
molecule ; 



(f) isolating the nucleic acid sequences 
encoding said first ' and second 
polypeptides ; and 

(g) sequencing said nucleic acid sequences. 

56. The method of claim 55, wherein said first 
and second vectors are circular. 

57. The method of claim 55, wherein said first 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors, integrins, hormone receptors 
and transmitter receptors. 

58. The method of claim 55, wherein said first 
and second DNA sequences encode functional portions of 
heteromeric receptors. 

59. The method of claim 58, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

60. The method of claim 55, wherein said 
expression of a diverse population of heteromeric receptors 
is on the surface of a cell filamentous bacteriophage 
selected from the group consisting of M13, fd and fl and at 
least one of said first or second DNA sequences is 
expressed as a gene VIII fusion protein. 

61. The method of claim 55, wherein said cell 
produces filamentous bacteriophage. 
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66. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein, one copy 
of said gene capable of being operationally linked to a DNA 
sequence encoding a polypeptide of a heteromeric receptor 
wherein said DNA sequence can be expressed as a fusion 
protein on the surface of said filamentous bacteriophage or 
as a soluble polypeptide. 

67. The vector of claim 66, wherein said two 
copies of . said gene encode substantially the same amino 
acid sequence but have different nucleotide sequences. 

68. The vector of claim 66, wherein said one 
copy of said gene is expressed on the surface of said 
filamentous bacteriophage. 

69. The vector of claim 66, wherein said 
bacteriophage coat protein is M13 gene VIII. 

70. The vector of claim 66, wherein said vector 
has substantially the same sequence as that shown in Figure 
2 (SEQ ID NO: 1) . 

71. A vector comprising sequences necessary for 
the coexpression of two or more inserted DNA sequences 
encoding polypeptides which form heteromeric receptors and 
two copies of a gene encoding a filamentous bacteriophage 
coat protein, one copy of said gene capable of being 
operationally linked to one of said two or more inserted 
DNA sequences wherein said DNA sequence can be expressed as 
a fusion protein on the surface of said filamentous 
bacteriophage or as a soluble polypeptide. 

72. The vector of claim 71, wherein said two 
copies of said gene encode substantially the same amino 
acid sequence but have different nucleotide sequences. 
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73. The vector of claim 71, wherein said one 
copy of said gene is expressed on the surface of said 
filamentous bacteriophage. 

74. The vector of claim 71, wherein said 
bacteriophage coat protein is M13 gene viu. 

75. The vector of claim 71, wherein said vector 
has substantially the same sequence as that show* in Figure 
6 (SEQ ID NO: 5) . 
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an mm I b mm m 

%\ M$ attca^tIaa a «[ 218 

511 AAACATTTTA CTATTACCCC CTCTGGCAAA AfTTfTTTTr KllSSSfSS TATCCAGTCT 540 
601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATHATflrTr rrrr^WS JCGCTATTTT 600 

%\ ATGAATCTTT MS ? " TO $ 

Si c T HM M$ S ifSffi SSS 

901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGfTTT? SSJ^ifSI i£^ TTT 900 

isll MB iH f M ?c 6 2°o 

K! CAGGCGATGA flHS^ H ™ ^ 

1201 CAAAGATGAG TG7TTTAGTG TATTCTTTCG rrT?TT#$ fKJffifJ "CTGGGGGT 1200 

jpi CAAAGCCTCT ™ Hf iiSS 

8 TOg^ ATGGTTGTTG ?W ^ 

lip- TTTTTGGAGA M ilg8 

11 TTTACTAACG 8MJ c& g 88 
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2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TrrTfTrarJ CTTTAATGAA 2220 
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3781 ACTGGTAAGA 
3841 TCCGGTGTT7 
3901 AATTTAGGTC 
3961 TGTCTTGCGA 
4021 GAGGTTAAAA 
4081 CAGCGTCTTA 
4141 AGCGACGATT 
4201 ATTAAAAAAG 
4261 TGTTTCATCA 
4321 TGTAACTTGG 
4381 TACTGTTACT 
4441 TGTTTTACGT 
4501 TAATCCAAAC 
4561 TGATAATTCC 
4621 TTTTAAAATT 
4681 GTCTAATACT 
4741 TAGTGCACCT 
4801 AACTGACCAG 
4861 TTTTTCATTT 
4921 CCTCACCTCT 
4981 AGGGCTATCA 
5041 TATTCTTACG 
5101 TACTGGTCGT 
5161 TCAAAATGTA 
5221 TCTGGATATT 
5281 TACTAATCAA 
5341 CGGTGGCCTC 
5401 AATCCCTTTA 
5461 ATACGTGCTC 
.5521 GTGTGGTGGT 
5581 TCGCTTTCTT 
5641 GGGGGCTCCC 
5701 ATTTGGGTGA 
. 5761 CGTTGGAGTC 
5821 CTATCTCGGG 
5881 ACAGGATTTT 
5941 CCAGGCGGTG 
6001 GGCGCCCAAT 
6061 ACGACAGGTT 
6121 TCACTCATTA 
6181 TTGTGAGCGG 
6241 GTGACTGGGA 
6301 AAGCACTATT 
6361 CGCCCAGGTC 
6421 CTAGGCTGAA 
6481 TGAGTACATT 
6541 TAAATTATTC 
6601 GATCGCCCTT 
6661 GCACCAGAAG 
6721 GTCGTCCCCT 
6781 TATCCCATTA 
6841 CTCACATTTA 
6901 GGCGTTCCTA 
6961 AAATATTAAC 
7021 TCTGATTATC 
7081 ATTCTCTTGT 
7141 AAATAGCTAC 
7201 GTGATTTGAC 
7261 GCATTGCATT 
7321 CTTCTCCCGC 
7381 GCTCTGAGGC 
7441 ACGT7 

! 10 



ATTTGTATAA 
ATTCTTATTT 
AGAAGATGAA 
TTGGATTTGC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAACGTTC 
TCTAAATCCT 
AAAGATATTT 
ATATTGATTG 
GCTGCTGGCT 
GTTTTATCTT 
GTTCGCGCAT 
CTTTCAGGTC 
GTGACTGGTG 
GGTATTTCCA 
ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA 
TACGCGCAGC 
CCCTTCCTTT 
TTTAGGGTTC 
TGGTTCACGT 
CACGTTCTTT 
CTATTCTTTT 
CGCCTGCTGG 
AAGGGCAATC 
ACGCAAACCG 
TCCCGACTGG 
GGCACCCCAG 
ATAACAATTT 
AAACCCTGGC 
GCACTGGCAC 
CAGCTGCTCG 
GGCGATGACC 
GGCTACGCTT 
AAAAAGTTTA 
CCCAACAGTT 
CGGTGCCGGA 
CAAACTGGCA 
CGGTCAATCC 
ATGTTGATGA 
TTGGTTAAAA 
GTTTACAATT 
AACCGGGGTA 
TTGCTCCAGA 
CCTCTCCGGC 
TGTCTCCGGC 
7AAAA7A7A7 
AAAAGTATTA 
TTTATTGCTT 



CGCATAT5AT 
AACGCCTTAT 
GCTTACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATGTT 
AGGTTATTCA 
TGAAATTGTT 
CTCAGGTAAT 

A A " 
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aaTCAGGCGA 
CTGACGTTAA 
TTGATATGGT 
ATATTGATGA 
GTGGTTTCTT 
GGGCAAAGGA 
CAAATGTATT 
TAGATAACCT 
AGGGTTTGAT 
CTCAGCGTGG 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
AATCTGCCAA 
TGAGCGTTTT 
CCGATAGTTT 
CTACAACGGT 
AAAACACTTC 
TGTTTAGCTC- 
CCATAGTACG 
GTGACCGCTA 
CTCGCCACGT 
CGATTTAGTG 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAAG 
GGCAAACCAG 
AGCTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 
GCTTTACACT 
CACACGCGTC 
GTTACCCAAG 
TCTTACCGTT 
AGTCAGGCCT 
CTGCTAAGGC 
GGGCTATGGT 
CGAGCAAGGC 
GCGCAGCCTG 
AAGCTGGCTG 
GATGCACGGT 
GCCGTTTGTT 
AAGCTGGCTA 
AATGAGCTGA 
TAAATATTTG 
CATATGATTG 
CTCTCAGGCA 
ATTAATTTAT 
CTTTCTCACC 
GAG6GTTCTA 
CAGGGTCATA 
AATTTTGCTA 

: 30 



ACTAAACAGG 
TTATCACACG 
ATATATTTGA 
ACATATAGTT 
GATTTTGATA 
TTCAAGGATT 
CTCACATATA 
AAATGTAATT 
TGAAATGAAT 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATACGA 
ATCTATTGAC 
TCCTCAATTC 
ATTTGAGGT7 
CACTGTTGCA 
TTCGTTCGGT 
TAGCCATTCA 
TATCTCTGTT 
TGTAAATAAT 
TCCTGTTGCA 
GAGTTCTTCT 
TAATTTGCGT 
TCAAGATTCT 
CCGCTCTGAT 
CGCCCTGTAG 
CACTTGCCAG 
TCGCCGGCTT 
CTTTACGGCA 
CGCCCTGATA 
TCTTGTTCCA 
GGATTTTGCC 
CGTGGACCGC 
CGTCTCGCTG 
CGCGTTGGCC 
GTGAGCGCAA 
TTATGCTTCC 
ACTTGGCAC7 
CTTTGTACAT 
ACCGTTACTG 
ATTGTGCCCA 
TGCATTCAAT 
AGTAGTTATA 
TTCTTAAGCA 
AATGGCGAAT 
GAGTGCGATC 
TACGATGCGC 
CCCACGGAGA 
CAGGAAGGCC 
TTTAACAAAA 
CTTATACAAT 
ACATGCTAGT 
ATGACCTGAT 
CAGCTAGAAC 
CTT7TGAA7C 
AAAATTTT7A 
A7G77777GG 
ATTCTT'TGCC 



C77777C7AG 
G7CGG7A777 
AAAAG7777C 
A7A7AACCCA 
AA77CAC7A7 
C7AAGGGAAA 
77GA777A7G 
AA7777G777 
AA77CGCC7C 
G777C7CCCG 
C7ACGCAA77 
CC77CCA7AA 
7C7GA7AA7C 
AA7GA7AA7G 
G77G7CGAA7 
GGC7C7AA7C 
C777C7AC7G 
CAGCAAGG7G 
GGCGG7G77A 
A77777AA7G 
AAAATA77G7 
GGCCAGAA7G 
CCA777CAGA 
A7GGC7GGCG 
AC7CAGGCAA 
GA7GGACAGA 
GGCG7ACCG7 
TCCAACGAGG 
CGGCGCA77A 
CGCCC7AGCG 
TCCCCG7CAA 
CC7CGACCCC 
GACGG77777 
AAC7GGAACA 
GA777CGGAA 
T7GC7GCAAC 
G7GAAAAGAA 
GA77CA77AA 
CGCAA77AA7 
GGC7CG7A7G 
GGCCG7CGT7 
GGAGAAAA7A 
TTTACCCC7G 
GGGGA77G7A 
AGT77ACAGG 
GT7GG7GC7A 
A7AGCGAAGA 
GGCGC777GC 
T7CC7GAGGC 
CCA7C7ACAC 
A7CCGACGGG 
AGACGCGAA7 
A7T7AACGCG 
C77CC7G777 
777ACGA77A 
AGCC777G7A 
GGTTGA£TAT 
j_TTACC7A CA 
i CC77GCG77 
7ACAACCG.-7 
TTGCCTGT-.T 



I AATTATGAT 3840 
CAAACCA77A 3900 
ACGCG77C77 3960 
ACC7AAGCCG 4020 
7GAC7C77C7 4080 
A77AA77AA7 4140 
TACTGTTTCC 4200 
TC77GA7G77 4260 
TGCGCGA777 4320 
A7G7AAAAGG 4380 
7C77TA777C 4440 
77CAGAAG7A 4500 
AGGAA7A7GA 4560 
77ACTCAAAC 4620 
TG777G7AAA 4680 
7A77AG77G7 4740 
77GA777GCC 4800 
A7GC777AGA 4860 
A7AC7GACCG 4920 
GCGA7G7777 4980 
C7G7GCCACG 5040 
7CCC7777A7 5100 
CGA77GAGCG 5160 
GTAATATTGT 5220 
G7GA7G77A7 5280 
C7C77TTAC7 5340 
7CC7G7C7AA 5400 
AAAGCACG77 5460 
AGCGCGGCGG 5520 
CCCGC7CC77 5580 
GC7C7AAA7C 5640 
AAAAAAC7TG 5700 
CGCCC777GA 5760 
ACAC7CAACC 5820 
CCACCA7CAA 5880 
7C7C7CAGGG 5940 
AAACCACCC7 6000 
7GCAGC7GGC 6060 
G7GAG77AGC 6120 
77G7G7GGAA 6180 
77ACAACGTC 6240 
AAG7GAAACA 6300 
7GACAAAAGC 6360 
C7AG7GGA7C 6420 
CAAG7GC7AC 6480 
CCA7AGGGAT 6540 
GGCCCGCACC 6600 
C7GG7T7CCG 6660 
CGA7ACGG7C 6720 
CAACG7AACC 6780 
77G77AC7CG 6840 
7A77777GA7 6900 
AA7777AACA 6960 
77GGGGC777 7020 
CCG77CA7CG 7080 
GA7C7C7CAA 71^0 
CA7A77GA7G 7200 
CAT t ACTCAG 7260 
GAAA7AAAGG 7320 
77AGC777A7 7380 
GA777A77GG 7440 
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121 
181 
241 
301 
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421 



mATGuACTA 
ATAGCTAAAC 
CGTTCGCA6A 
GTTGCATATT 
TCCGCAAAAA 
TTGGAGTTTG 
TCTTTCGGGC 



_ ! 20 
Ci ATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACATGT 
TGACCTCTTA 
CTTCCGGTCT 

r^rr-TIT? TTCCTCTTAA 
CAGGuiAAAG ACCTGATTTT 
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AATTGATGCC 
CCATTTGCGA 
AACTGTTACA 
TGAGCTACAG 
JCAAAAGGAG 
GGTTCGCTTT 
TCTTTTTGAT 
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50 
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Witt SMB «i.«E M 

£ Sffl io° 

CAATTAAAGG TACTCTGTAA TrfffS 

GCAATCCGCT MM 
"■- 1 - £GCT '-T^JJCTGA CTATAATAGT 



300 
360 
420 



yti Lrtbbu i AAAG ACCTGATTTT TGATTTATGr, TrATTrrrrl ICTGA CTATAATAGT 4?n 

a i i a i Si sis i Si fSl I 



1 II If ill ill Hi MB i 

781 TCTTCCCAAr GTrrTr.S/" GTATflflTrlr rfJr+T?-^ - - 

GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAAtfCA M 



» T T CCGTTAGTT^ «™ 
Mi CAATGATT AA aS^ Jlrf 1 ™ MS £ 

901 CJCGTCAGGG CAAGCCTTAT ^TGaSI AGCA^TTTG JfSffigJ • 900 

- «™™ 'CGCCTGGTC ? 0 6 2° u 

TGATTGACC 1080 



iiiiililfiiiil 
i in si in in iii hi i 



1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 



CGATCCCGCA 
TGCGTGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TATTCTCACT 
TTTACTAACG 
CTGTGGAATG 
TGGGTTCCTA 



GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAAGCAAGCT 
TTTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACAGGCGT 
TTGGGCTTGC 



H Mi? 

GAAAAAATTA TTATTCGCAA TTrrTTTar-r 

TGTTGAAAGT TGTTTAGCAA AACCCf ATflr 

?^5W*SI T TA"GATCGTT ACGCTAACTA 

TGTAGTTTGT ACTGGTGACG AAArTTArir 

JCTGAGGGtG GCGGTTCTGA l&WM ^^AGGGTG GIGGCTCTGA GGGT GGCGGT 



CTGAGGGTGA 
ATATCGGTTA 
JGTTTAAGAA 
GGAGCCTTTT 
TGTTCCTTTC 
AGAAAATTCA 
JGAGGGTTGT 
TTACGGTACA 



1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 



1 ATTCCGGGCT Sfffi ?«8 S 

^5 MKCCGCTA ATCCTAATCC TTCTfTTK^ 

AJCCGCCJGG TACTGAGCAA 198n 



2041 
2101 
2161 
2221 
2281 
2341 



AACCCCGCTA 
CAGAATAATA 
CAAGGCACTG 
■TATGACGCTT 
GATCCATTCG 
GCTGGCGGCG 
GGCGGTTCTG 



ATCCTAATCC 
GGTTCCGAAA 
ACCCCGTTAA 
ACTGGAACGG 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 



TTCTCTTGAG 
TAGGCAGGGG 
AACTTATTAC 
TAAATTCAGA 
TCAAGGCCAA 
TGGTTCTGGT 



GAGTCTCAGC 
GCATTAACTG 
CAGTACACTC 
GACTGCGCTT 
TCGTCTGACC 
G6CGGCTCTG 



CTCTTAATAC 
THATACGGG 
CTGTATCATC 
TCCATTCTGG 
TGCCTCAACC 
AGGGTGGTG6 



TACTGAGCAA 
nJCATGTTT 
CACTGTTACT 
AAAAGCCATG 
CTTTAATGAA 
TCCTGTCAAT 
CTCTGAGGGT 



1980 
2040 
2100 
2160 
2220 
2280 
2340 



2641 
2701 
2761 
2821 
2881 
2941 
3001 
3061 
3121 
3181 
3241 
3301 
3361 
3421 
3481 
3541 
3601 
356 1 
3721 
3731 



TAATGAATA 
TTTGTCTTTA 
TTCCGTGGTG 
TTTGCTAACA 
TATTATTGCG 
TTAAAAAGGG 
GGCTTAACTC 
TTGTTCAGGG 
TCTCTGTAAA 
ATTGGGATAA 
CTCGTTAGCG 
CTTGATTTAA 
CTTAGAATAC 
TCCTACGATG 
ACCCGTTCTT 
AAATTAGGAT 
CGTTCTGCAT 
^TTGTCGGTA 
uTTGGCGTTG 
. cj . A-GA 



CTGGCTCTAA 

ATTTCCGTCA 

GCGCTGGTAA 

TCTTTGCGTT 

TACTGCGTAA 

TTTCCTCGGT 

CTTCGGTAAG 

AATTCTTGTG 

TGTTCAGTTA 

GGCTGCTATT 

ATAATATGGC 

TTGGTAAGAT 

GGCTTCAAAA 

CGGATAAGCC 

AAAATAi-:: 

GGAATGATAA 

GGGATATTAT 

TACGTGAACA 

CTTTATAT7C 

TTAAAT:tc-G 



•TTCCCAAATG 
ATATTTACCT 
ACCATATGAA 
TCTTTTATAT 
TAAGGAGTCT 
TTCCTTCTGG 
ATAGCTATTG 
GGTTATCTCT 
ATTCTCCCGT 
TCATTTTTG 



GCTCAAGTCG 
TCCCTCCCTC 
JJTTCTATTG 
GlTGCCACCT 
TAATCATGCC 
TAACTTTGTT 
CTATTTCATT 
CTGATATTAG 
CTAATGCGCT 
ACGTTAAACA 



TGGTGCTACT 
TAATTCACCT 
ATGTCGCCCT 
AATAAACTTA 
ATTTTCTACG 
GGTATTCCGT 



li III IIP §§i in mi m a 

a is figs;; mm sSSs g|» 

AATCGGTTGA 
ATTGTGACAA 
TTATGTATGT 

AGTTCTTTTG uul() , lllul 
"GCTATCTG CTTACTTTTC 

rrffill^T CTTATTATTG 
^CKAATTA CCCTCTGACT 
IfffiSIJii TATGTTATTC 

TCAGGATAAA $$1$$ ? P P ^ISiS 32^ 
CCTCCCGCAA aVrrrrirh f-im AGCAACTAAT 3300 

GATTTGCTTG C^GGcl ^™ 3360 

fCGpJTATTb A iTGGTTTCT 
t A GGACTTAT CTATTGTTGA 
'GiCGTCGil TGGACAGAA7 
GuCilGAAAA TGCCTCTG r C 
TTAAGCCCTA CTGTTGAGCG 



~"aggataaa ^ m 

CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCTT 
GGAAAGACAG 
TTTTCTTGTT 
TGTTGTTTAT 
TCTTATTACT 
CGATTCTCAA 
■ ATG.-T 



2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 



CGGTAATGAT 
• : GGT7TAAT 
ACATGCTCGT 
TAAACAGGCG 
T-CTTTACCT 
i - AATTACAT 
"GGCTTTAT 



3420 
3480 
3540 
3600 
3660 
3720 
3780 
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3841 
3901 
39<=. 1 

4021 
4081 
4141 
4201 
4261 
4321 
4381 
4441 
4501 
4561 
4621 
4681 
4741 
4801 
4861 
4921 
4981 
5041 
5101 
5161 
5221 
5281 
5341 
5401 
5461 
5521 
5581 
5641 
5701 

5761 

5821 

5881 

5941 

5001 

6061 

6121 

6181 

6241 

6301 

6361 

6421 

6481 

6541 

6601 

5651 

6721 

6781 

5841 

6901 

5961 

7021 

7081 

71-1 

7201 

7251 



TCCGGTGTTT 
AA777AGG7C 
TGTCTTGCGA 
bAGGTTAAAA 
CAGCGTCTTA 
AGCGACGATT 
ATTAAAAAAG 
TGTTTCATCA 
TGTAACTTGG 
TACTGTTACT 
TGTTTTACGT 
TAATCCAAAC 
TGATAATTCC 
TTTTAAAATT 
GTCTAATACT 
TAGTGCACCT 
AACTGACCAG 
TTTTTCATTT 
CCTCACCTCT 
AGGGCTATCA 
TATTCTTACG 
TACTGGTCGT 
TCAAAATGTA 
TCTGGATATT 
TACTAATCAA 
CGGTGGCCTC 
AATCCCTTTA 
ATACGTGCTC 
GTGTGGTGGT 
TCGCTTTCTT 
GGGGGCTCCC 
ATTTGGGTGA 
CGTTGGAGTC 
CTATCTCGGG 
ACAGGATTTT 
CCAGGCGGTG 
66CGCCCAA7 
ACGACAGGTT 
TCACTCATTA 
TTGTGAGCGG 
TACGGCAGCC 
GACCCAGACT 
CTGGCCGTCG 
CCTTGCAGAA 
TTCCCAACAG 
AGCGGTGCCG 
CTCAAACTGG 
7ACGG7CAA7 
"AATGTTGA 



ATTCTTATTT 
AGAAGATGAA 
TTGGATT7GC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAACGTTC 
TCTAAATCCT 
AAAGATATTT 
ATATTGATTG 
GCTGCTGGCT 
GTTTTATCTT 
GTTCGCGCAT. 
CTTTCAGGTC 
GTGACTGGTG 
GGTATTTCCA ■ 
ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA- 
TACGCGCAGC 
CCCTTCCTTT 
TTTAGGGTTC 
TGGTTCACGT 
CACGTTCTTT 
CTATTCTTTT 
CGCCTGCTGG 



TATTGGTTAA 
ACGTTTACAA 
TCAACCGGGG 
G7TTGCTCCA 
ACCCTCTCCG 
ACTGTCTCCG 
TTTAAAfiTAT 
GCAAAAGTAT 
GCTTTATTGC 
! 10 



AAGGGCAATC 
ACGCAAACCG 
TCCCGACTGG 
GGCACCCCAG 
ATAACAATTT 
GCTGGATTGT 
CCAGATATCC 
TTTTACAACG 
TTCCCTTTCG 
TTGCGCAGCC 
GAAAGCTGGC 
CAGATGCACG 
CCGCC6TTTG 
GAAAGCTGGC 
AAAATGAGCT 
TTTAAATATT 
TACATATGAT 
GACTCTCAGG 

U'„.-. ! I hkl i , 

GCCTTJCTCA 
- TGAGGG I 
J.ACAGGG7C- 
TTAATi TTGI 
20 



AACGCCTTAT 
GCTTACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATGTT 
AGGTTATTCA 
TGAAATTGTT 
CTCAGGTAAT 
AATCAGGCGA 
CTGACGTTAA 
TTGATATGGT 
ATATTGATGA 
GTGGTTTCTT 
GGGCAAAGGA 
CAAATGTATT 
TAGATAACCT 
AGGGTTTGAT 
CTCAGCGTGG 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
AATCTGCCAA 
TGAGCGTTTT 
CCGATAGTTT 
CTACAACGGT 
AAAACACTTC 
TGTTTAGCTC 
CCATAGTACG 
GTGACCGCTA 
CTCGCCACGT 
CGATTTAGTG 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAAG 
GGCAAACCAG 
AGCTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 
GCTTTACACT 
CACACGCCAA 
TATTACTCGC 
AACAGGAATG 
TCGTGACTGG 
CCAGCTGGCG 
TGATTGGCGA 
TGGAGTGCGA 
GTTACGATGC 
TTCCCACGGA 
TACAGGAAGG 
GATTTAACAA 
TGCTTATACA 
TGACATGCTA 
CAATGACCTG 
ATCAGCTAGA 
CCCTTTTGAA 
7AAAAA7777 



lAATGTTTTT 
TAA7TCTTTG 
! 30 



TTATCACACG 
ATATATTTGA 
ACATATAGT7 
GA7777GA7A 
T7CAAGGA77 
C7CACA7A7A 
AAA7G7AA77 
7GAAA7GAA7 
A7CCG77A77 
ACCTGAAAAT 
7GG77CAA77 
AT7GCCA7CA 
TG77CCGCAA 
777AA7ACGA 
A7C7A77GAC 
7CC7CAA77C 
A777GAGG77 
CAC7G77GCA 
77CG77CGG7 
TAGCCA77CA 
TA7CTC7G77 
TG7AAA7AA7 
TCC7G77GCA 
GAG77C77C7 
7AA7T7GCGT 
TCAAGA77C7 
CCGCTC7GA7 
CGCCC7G7AG 
CAC77GCCAG 
7CGCCGGC77 
CT77ACGGCA 
CGCCC7GA7A 
7C77G77CCA 
GGA7777GCC 
CGTGGACCGC 
CGTCTCGC7G 
CGCGTTGGCC 
GTGAGCGCAA 
7TA7GC77CC 
GGAGACAG7C 
7GCCCAACCA 
AGTGTTAA77 
GAAAACCC7G 
7AA7AGCGAA 
. ATGGCGC777 
7CT7CC7GAG 
GCCCA7C7AC 
GAA7CCGACG 
CCAGAC6CGA 
AAA777AACG 
A7C77CC7G7 
GT777ACGA7 
ATAGCC777G 
ACGG77GA:7 
7C777ACC7: 
7ATCC77GCG 
G6TACAACCG 
CC7" 



G7CGG7A777 
AAAAG7777C 
ATA7AACCCA 
AA77CAC7A7 
CTAAGGGAAA 
77GA777A7G 
AA7777GT77 
AA77CGCC7C 
G777C7CCCG 
C7ACGCAA77 
CC77CCA7AA 
7C7GA7AA7C 
AA7GATAATG 
G77G7CGAA7 
GGC7C7AA7C 
C777C7AC7G 
CAGCAAGG7G 
GGCGG7G77A 
A77777AA7G 
AAAATA77G7 
GGCCAGAA7G 
CCA7T7CAGA 
A7GGC7GGCG 
AC7CAGGCAA 
GA7GGACAGA 
GGCG7ACCG7 
7CCAACGAGG 
CGGCGCA77A 
CGCCC7AGCG 
7CCCCG7CAA 
CC7CGACCCC 
GACGG77777 
AAC7GGAACA 
GA77TCGGA- 



G i 6AAAAG- - 
GAT7CA77A- 
CGCAA7TA:7 
GGC7CG7A7G 
A7AA7GAAAT 
GCCA7GGCCG 
CTAGAACGCG 
GCG77ACCCA 
GAGGCCCGCA 
GCC7GG777C 
GCCGA7ACGG 
ACCAACG7A; 
GG77G77AC7 
A77A77777G 
CGAA7777A- 
T777GGGGC ! 
7ACCG77CA7 
7AGA7C7C7C 
- i CA i A77G - 
^CfJTACT-: 
. 1 , oAAA7-£- 
: J77£GC7_3 



CAAACCA77A 
ACGCG77C77 
ACC7AAGCCG 
7GAC7C77C7 
A77AA77AA7 
7AC7GT77CC 
TC77GA7G77 
TGCGCGA777 
A7G7AAAAGG 
7C777A777C 
77CAGAAGTA 
AGGAA7A7GA 
77AC7CAAAC 
TG77TG7AAA 
TA77AG77G7 
77GA777GCC 
A7GC777AGA 
A7AC7GACC6 
GCGA7G7777 
C7G7GCCACG 
7CCCT777A7 
CGA77GAGCG 
GTAATA77G7 
G7GATG77AT 
C7CTT77AC7 
7CCTG7C7AA 
AAAGCACGT7 
AGCGCGGCGG 
CCCGC7CC77 
GCTC7AAATC 
AAAAAAC7TG 
CGCCC777GA 
ACAC7CAACC 
CCACCA7CA- 
7C7C7CAGGG 
AAACCACCC7 
7GCAGCTGGC 
GTGAG77AGC 
77G7G7GGAA 
ACC7A77GCC 
AGC7CG7GA7 
7CAC77GGCA 
AGC77AA7CG 
CCGA7CGCCC 
CGGCACCAGA 
7CG7CG7CCC 
CC7A7CCCA7 
CGC7CACA7T 
A7GGCG77CC 
CAAAA7A77A 
777C7GA77A 
CGA77C7C7T 
AAAAATAGC7 
7GG7GA77TG 
A.GGC A77GC A 

GGC77C7CC3 
A7GC7C7GAG 
66/ 



: I 0 I 



3900 
3950 
4C20 
4030 
4140 
4200 
4250 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
60GC 
6061 
612C 
5180 
6240 
5300 
636C- 
642G 
6480 
6540 
660C 
5660 
6723 
578G 
6840 
6900 
596C 
702: 
7031- 

7iv: 

720: 
725: 
.'31* 
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Si WaJrf A^fel Jffifecl ACCTjncg cWgCcJ AAAT^ AA f? 60 
??} TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAARf TarrrrrMf fKJAAGCCA 240 

I ^ wa « pat 

IS} AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTfTTTTr r A ffi£?£££ I5J£CAGTCT 540 
SMS SMj TAT6CCTCGT ggg 

Hi II! Ill 11 II PI 
1 pi Sill II II PI 

1201 CAAAGATGAG TGTTTTAGTG TATTCTTTfG (TTrTTTrrT IffSJJIMJ CGCTGGGGGT 1200 
If GTAGCCGTTG^ 1 S 8™ 1^8 

TTTTTGGAGA tfM S 

JtB $ H « ins 

« TTGGGCTTGC Pi H3i 

8 ATTCCGGGCT MB cl I P ^ " ^ 

leimaBUPii 
WUflWI 

2641 TTAATGAATA ATTTCCGTCA ATATTTArrT TcrrTrrr^ SI5SS5B A TAATTCACCT 2640 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTfTflSf }SS?SIJS A ATGTCGCCCT 2700 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTAtM rrrrrrArr^ HI?ISK AA . AATAAACTTA 2760 

liliillllS Hi 

IlSi »A^ I g SIS AGCAACTAAT IIS8 

Isil MSE liSS ffi S ACATGCTCGT 

SI SMS f SI { » 1188 

3721 GTTGGCGTTG TTAAATAT6G rCATTrrrfl TT« C I££?^J EfffiTGCC TAAATTACAT 3720 
378J ACTGGTAACS IMS ACTAAACAG6 HSfiflgl 



WO 92/06204 

PCT/LS9I/07H9 

I SI IS ffiffl fflffi sgg »» as 

^021 GA6GTTAAAA AGGTAGTCTC TCAGACCTAT ££tttttat! A ItJ AA & CA ACCTAAGCCG 4020 

AGC6ACGATT » 5 

4201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT aaatttaatt IIS^III6 TG TACT6TTTCC 4200 

111 II III 111 HP II 

4501 TAATCCAAAC AATCAGGATT ATATTGATGA ATTrrfATrl tctMFM JpGAAGTA 4500 
456 TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTrrGrAA Iat£atMK ^GGAATATGA 4560 
?621 TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTA ATA rr a rTTrrrMI? JJACTCAAAC 4620 
4681 GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAf rrrrr^AJ T G IJ TGTAAA «680 
4741 TAGTGCACCT AAAGATATTT TAGATAACCT TCrrcIl^r rxSHMJ^ JATTAGTTGT 4740 

si lie si I |s ■ 

TATTCTTACG TO® I£ M l§§8 

MS MS t&J CGATTGAGCG Iig8 

MI $ g ^ GTGATGTTAT 1118 

5341 CGGTGGCCTC ACTGATTATA AAAACACTTf TP* ArATTrx ££JS?}£J$$ CTCTTTTACT 5340 
5401 AATCCCTTTA ATCGGCCTCC TGTTTAGCTC fTGrTrTrAT t^SUSSJ TCCTGTCTAA 5400 

MS? ft« S GCTCTAAATC £8 

si ia iRi II I is 11 

5881 ACAGGATTTT CGCCTGCTGG GGCAAACCAG fGTfifilrrrr TTrffi^H CCACCATCAA 5880 

6301 AAGCACTATT GCACTGGCAC TCTTACCGTT ArTTTTTArJ ?S$?$JWT A AAGTGAAACA 6300 
6361 CCAGCTGCTC GAGTCGGTCT TCCCCCTGGC ACfrTfTTrf SSHBSSfi AAGCCCAGGT 6360 
f*21 AGCGGCCCTG GGCTGCCTGG TCAAGACTAA TTCCrrrlflr rrr T S?fSI CRGGGGCAC 6420 
6481 TCAGGCGCCC TGACCAGCGG CGTGCACACC TTrrrGGrTr frrrff E?§I ?K§I G( >AAC 6480 
6541 TACTCCCTCA GCAGCGTGGT GACCGTGCrr TrrArrAr?T rffJJfSIf £T CAGGACT C 6540 
6601 TGCAACGTGA ATCACAAGCC CAGCAACACf AAGfTrr ata I£§§?J£?S a GA CCTACAK 6600 
6661 TGTACTAGTG GATCCTACCC GTACGACGTT rrrrlrrirt f?$J}5H^ GCCCAAATCT 6660 
672 GACCCTGCTA AGGCTGCATT CAATAGTTTA rSfiPrHpT? frIfTJ A ^E TGAAGGCGAT 6720 
6781 GCTTGGGCTA TGGTAGTAGT TATAfiTTrrT rrr^rM^ ^JACTGAGTA CATTGGCTAC 6780 
6841 TTTACGAGCA AGGCTTCTTA AGCAATAGCG AAGAffrrf^ NttVMll A ™AAAAG 6840 
6901 AGTTGCGCAG CCTGAATGGC GAATGGCGCT TTrrrTrrrT t£££5 A I£S£ CCTTCCCAAC 6900 
6961 CGGAAAGCTG GCTGGAGTGC GATCTTCfTn IrrffrAjIJ JfCGGCACCA GAAGCGGTGC 6960 

P SfM « ATTACGGTCA ?8IS 

PS! iHM ctgahtaaS I S I ™ 

7261 AATTTAAATA TTTGCTTATA CAATCTTCCT GTTTTTrrrr rSWJWK I AA -£ 6TTTAC 725 0 

P cagactctca ggcaatgacc g ffiffi F™ " ^ 

?SSI CGGCATTAAT TTATCAGCT SMil IfMffi r ? . ? ffiS HMSBSB ?S!!° 
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i!81 IVcB GAAGCTTACT liW'j TGAA*AAGT^ S& ATT ™ 3900 

3961 CGATTGGATT TGCATCAGCA TTTACATATA GTTATATA r rrKrrSSH SESKF 6 3950 

402 AAAAGGTAGT CTCTCAGACC TATGATTTTG ATAAATTCAf TATTrifTrV CCGGAGGTTA ii020 

4081 TTAATCTAAG CTATCGCTAT GTTTTCAAGG hTTCnlrrr IaII^aTC TCTCAGCGTC ^080 

•4141 ATTTACAGAA GCAAGGTTAT TCACTCACAT ATATTfiATTT JittI^^ ^J^ CGACG "140 

4201 AAGGTAATTC AAATGAAATT GTTAAATGTA ATTAATTTTf TTTTf TTr^- KWU^ "200 

4261 TCATCTTCTT TTGCTCAGGT AATTGAAATG AATAATTfGf nJjrrJ^\ SIEJI TCA 4,260 

4321 TGGTATTCAA AGCAATCAGG CGAATCCGTT ATTGTTTfTr frrlrrKK TJJI GTAACT «20 

4381 ACTGTATATT CATCTGACGT TAAACCTGAA AATrTArffi atttJSW AG6TACTGTT 4380 

®l Mig ATTATATTGA g § SB SSS8 

2 fl IMS MHI S ™ ^118 

J ! JMKI TTTTAGATAA I M 3H8 

8 Sf?*« GCTCTCAGCG I l« 2Ig8 

5101 CGTGTGACTG GTGAATCTGC CAATGTAAAT AATfTATTTr {rArftffil J A JJACTGGT 5100 
5161 GTAGGTATTT CCATGAGCGT TTTTCCTGTT GCAATfrrTr frrrT«H^ GCGTCAAAAT 5160 

llilllllllllllll 
I III 111 111 11 III 1 

6001 AATACGCAAA CCGCCTCTCC CCGCGCGTTG firrrflTTrJ? TAATrr£#? CCTGGCGCCC 6000 

s 6 ?!! M$ caggcttta^ I & 6 S 

8Si GCCGCTGGAT M i| 8 8$ liS8 

lagillliPI 

IMHUMH 




iiiiiililiilll 

« «| « m wm i 

^ u I 30 I 40 I 50 60 

FIG. 5-2 



WO 92/06204 



PCT/LS9 1/07 149 



I 10 
AATGCTACTA 
ATAGCTAAAC 
CGTTCGCAGA 
GTTGCATATT 
"CTGCAAAAA 
TTGGAGTTTG 
TCTTTCGGGC 
CAGGGTAAAG 
iTTGAGGGGG 
AAACATTTTA 
GGTTTTTATC 
AATTCCTTTT 
ATGAATCTTT 
TCTTCCCAAC 
CAATGATTAA 
CTCGTCAGGG 
AATATCCGGT 
TGTACACCGT 
GTCTGCGCCT 
CAGGCGATGA 
CAAAGATGAG 
GTGGCATTAC 
CAAAGCCTCT 
CGATCCCGCA 
TGCGTGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TATTCTCACT 
TTTACTAACG 
CTGTGGAATG 
TGGGTTCCTA 
TCTGAGGGTG 
ATTCCGGGCT 
AACCCCGCTA 
CAGAATAATA 
CAAGGCACTG 
TATGACGCTT 
GATCCATTCG 
2281 GCTGGCGGCG 
2341 GGCGGTTCT6 
240 GATTTTGATT 
2461 GAAAACGCGC 
2521 GCTGCTATCG 
2581 GGTGATTTTG 
2641 TTAATGAATA 
TTTGTCTTTA 
TTCCGTGGTG 
TTTGCTAACA 
TATTATTGCG 
TTAAAAAGGG 
GGCTTAACTC 
TTGTTCAGGG 
TCTCTGTAAA 
3181 ATTGGGATAA 
3241 CTCGTTAGCG 
gOl CTTGATTTAA 
3361 CTTAGAATAC 
3?21 TCCTACGATG 
3^81 ACCCGTTCTT 
3^41 AAATTAGGAT 
3601 CGTTCTGCAT 
IIS, 1 JJJGTCGGTA 
3721 GTTGGCGTTG 
3781 ACTGGTAAGA 
m "CCGGTGTII 



1 

61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
841 
901 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1981 
2041 
2101 
2161 
2221 



2701 
2761 
2821 
2881 
2941 
3001 
3061 
3121 



I 20 
CTATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACATGT 
TGACCTCTTA 
CTTCCGGTCT 
TTCCTCTTAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCTGGT 
GGCGTTATGT 
CTACCTGTAA 
GTCCTGACTG 
AGTTGAAATT 
CAAGCCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCCGGCT 
TACAAATCTC 
TGTTTTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAAGCAAGCT 
TTTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACAGGCGT 
TTGGGCTTGC 
GCGGTTCTGA 
ATACTTATAT 
ATCCTAATCC 
GGTTCCGAAA 
ACCCCGTTAA 
ACTGGAACGG 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 
ATGAAAAGAT 
TACAGTCTGA 
ATGGTTTCAT 
CTGGCTCTAA 
ATTTCCGTCA 
GCGCTGGTAA 
TCTTTGCGTT 
TACTGCGTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
TGTTCAGTTA 
GGCTGCTATT 
ATAATATGGC 
TTGGTAAGAT 
GGCTTCAAAA 
CGGATAAGCC 
AAAATAAAAA 
GGAATGATAA 
GGGATATTAT 
TAGCTGAACA 
CTTTATATTC 
TTAAATATGG 
ATTTGTATAA 
ATTCTTATTT 

^OnMOrt I 6 A ~ 



10/ 11 

I 30 I an 
AATTGATGCC ACCTTTTCAG 

^^CTGTTACA TGGAATGAAA 
TGAGCTACAG CACCAGATTf 
TCAAAAGGAG CAATTAAAGG 
^TTCGCTTT GAAGCTCGAA 
TCTTTTTGAT GCAATCCGfT 
TGATTTATGG TCATTCTCGT 

C J CTGGCAAA ACTTCTTTTfi 

T^ffi 6 - 6 -! TAT6ATAGT6 
A JfTGCATTA GTTGAATGTG 

GTATAATGAG CCAGTTfTTfl 

TCACTGAATG AGCAGCTTTf; 

iffiffillP- *TGAAGGTC? 
JCTTTCAAAG TTGGTCAGTT 
AAGTAACATG GAGCAGGTCG 
"TTGiACTT TGTTTCGCGC 
rrTrfHT^ CCTCTTTCGT 

"«g c It t g c c c t t g c 

TCATTGTCGG C6CAACTATC 
6ATAAACCGA TACAATTAAA 
GAAAAAATTA ift#<$AA 
rrlrKMS JGTTTAGCAA 
TfTArTTTrl JTAGATCGTT 
TGTAGTTTGT ACTGGTGACG 

rrRSJSM AATGAGGGTG 
GGGTGGCGGT ACTAAAffTr 

CAACCCTCTC GACGGCACTT 
TTCTCTTGAG GAGTCTCAGC 
IJGGCAGGGG GCATTAACTG 
T A ffflJI^ C CAGTACACTC 
r?5?ffi^ A GACTGCGCTT 
TCAAGGCCAA TCGTCTGACC 
rTfTrfJ^T GGCGGCTCTG 
frrffiAGGGA GGCGGTTCCG 

rrrrttffil AATAAGGGGG 
TffTr arr tt AAACTTGATT 
TTrrrAAATr JCCGGCCTTG 

ATATTTACCT Weill 
ACCATATGAA TTTTCTATTG 
TCTTTTATAT GTTGCCACCT 
I^AGGAGTCT TAATCATGCC 
TTCCTTCTGG TAACTTTGTT 
ATAGCTATTG CTATTTCATT 
GGTTATCTCT CTGATA7TAG 
rrrffi???! CT AATGCGCT 
I'CATTTTTG ACGTTAAACA 
^JTTATTTT GTAACTGGCA 
J^^ATAAA ATTGTAGCTG 
rxrffi 6 ^ GTCGGGAGGT 
rW^WM GATTTGCTTG 
CGGCTTGCTT GTTCTCGATG 
GGAAAGACAG CCGATTATTG 
'•J'CTTGT. CAGGACTTAT 
TrT$W£I "GTCGTCGTC 
fflirffifl §GCTCGAAAA 
LbLATATGMi APTA/l Afv.r.r 




H^kfff AAATGAAAAT 60 
rr5$K£ AAC TAAATCTACT 120 
CTTCCAGACA CCGTACTTTA fan 

t^JEM 6 CTCTAAGCCA 240 
TrffiIffi A TCCTGACCTG 300 
TTrrTT^? A J A J T TGAAG 360 

188 

GTTTTATTAA CGTAGATTTT 7an 
AAATCGCATA AGGTAATTCA 840 
ffilJCTCGT TCTGGTGTTT 900 
ffiCGTTGAT TTGGGTAATG 960 
rr^SKJ GCGCCTGGTC 1020 
rrrlrSftl A JGATTGACC 1080 
TTrrxffiK £ ACAA T™T 1140 
TTGGTATAAT CGCTGGGGGT l?nn 
TTTAGGTTGG TGCCTTCGTA 1260 
i T M^M CTTTAGTCCT 320 
J3IK5? TG CTGAGGGTGA 1380 
r$}#S AA J ATA ^GGTTA 440 
rrrffiW^ TGTTTAAGAA 1500 
TTrr tttTJt GGAGCCTTTT 1560 
MCCTTTAGT TGTTCCTTTf lfi?n 

MftCCATAC AGAAAATTCA 680 
A CGCTAACTA .TGAGGGTTGT 1740 

Trrlrlttlt AAAAGCCATG 2160 
IrrpffllS GTTTAATGAA 2220 

^GGGTGGTGG CTCTGAGGGT 2340 
^JGTGGCTC TGGTTCCGGT 2400 

CTCTrrrr^r $}}IS? CGAT 2 ^60 
f|GTCGCTAC TGATTACGGT 2520 
CTAATGGTAA TGGTGCTACT 2580 
JJGACGGTGA TAATJCACCT 2640 
AATCGGTTGA ATGTCGCCCT 2700 
AATAAACTTA 2760 
IJATGTATGT ATTTTCTACG 2820 
AGJTCTTTTG GGTATTCCGT 2880 
GGGCTATCTG CTTACTTTTC 2940 
GJJTCTTGCT CTTATTATTG 3000 
"CTCAATTA CCCTCTGACT 3060 
IffflSIJU TATGTTATTC 3120 
i3TT A Irr!I TCTTATTTGG 3180 
rrrU^K TGGAAAGACG 3240 
rrJ^ AAAAT AGCAACTAAT 3300 
SSff AAA ^ GCCTCGCGTT 3360 
ffiJJ^GGCG CGGTAATGAT 3420 
A §JGCGGTAC TTGGTTTAAT 3480 
ATTGGTTTCT ACATGCTCGT 3540 
x/ri;VJI G : TAAACAGGCG 3600 
IGGACAGAAT TACTTTACCT 3660 
'GCCTCTGCC TAAATTACAT 3720 
^I§IiGAGCG TTGGCTTTAT 3780 
f ' » 'iiCTAG TAATTATGAT 3840 
GTCGuiAni CAAACC-T7A *2Qr. 
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